Techniques for determining the distance from a camera to an object based on an image captured by the camera have a variety of applications. For example, advanced driver assistance systems may apply such techniques to determine the distance of an object from a vehicle. Stereo depth measurement is a technique for determining depth based on images captured by two or more cameras. Using stereo depth measurement, the distance from the cameras to an object is determined based on the displacement of the object from one image to another.
In one example, a system includes a downsampling circuit, a stereo disparity engine, and a merge circuit. The downsampling circuit is configured to generate a first two-dimensional array by down sampling a second two-dimensional array, and generate a third two-dimensional array by down sampling a fourth two-dimensional array. The stereo disparity engine is configured to generate a first disparity map relating elements of the first two-dimensional array to elements of the third two-dimensional array, generate a second disparity map based on the first disparity map, a first sub-array of the second two-dimensional array and a second sub-array of the fourth two-dimensional array, and generate a third disparity map based on the first disparity map, a third sub-array of the second two-dimensional array and a fourth sub-array of the fourth two-dimensional array. The merge circuit is configured to combine the second disparity map and the third disparity map to generate a fourth disparity map.
In another example, a method includes partitioning a first array into a first sub-array and a second sub-array, and partitioning a second array into a third sub-array and a fourth sub-array. The method also includes generating a first disparity map relating the first sub-array to the third sub-array, and generating a second disparity map relating the second sub-array to the fourth sub-array. The method further includes merging the first disparity map and the second disparity map to produce a third disparity map relating the first array to the second array.
In a further example, an integrated circuit includes a processor and a stereo depth engine. The stereo depth engine is coupled to the processor. The processor is configured to downsample a first image to produce a second image, and downsample a third image to produce a fourth image. The stereo depth engine is configured to generate a first disparity map relating the second image and the fourth image. The processor is configured to partition the first image into a first sub-image and a second sub-image based on the first disparity map; and partition the third image into a third sub-image and a fourth sub-image based on the first disparity map. The stereo depth engine is configured to generate a second disparity map relating the first sub-image and the third sub-image, and generate a third disparity map relating the second sub-image and the fourth sub-image. The processor is configured to combine the second disparity map and the third disparity map to produce a fourth disparity map relating the first image and the third image.
In some stereo vision systems, the cameras are provided in a single modular unit, which allows the cameras to be easily incorporated in a higher-level assembly. However, in such modular units, the distance between the cameras is relatively small (e.g., 12-30 centimeters), which may limit the displacement of objects across images captured by the cameras, and limit the depth measurement capabilities of the system. To provide improved depth measurement, some stereo vision systems forgo the convenience of modular assemblies, and include cameras with high-resolution (e.g., 8-16 megapixel or higher) image sensors that are mounted relatively far apart (e.g., 1-2 meters apart). The wider spacing between the cameras increases the distance between corresponding pixels of the stereoscopic images. Hardware accelerators designed to process such images may have relatively large memories to store the high-resolution images and a relatively large disparity search range (e.g., 800 pixel) to identify corresponding pixels. These features increase circuit size and cost.
The stereo vision system described herein reduces overall system cost by employing a hardware accelerator (a stereo disparity engine) having smaller memories and lower disparity search range to process high-resolution images acquired from widely spaced cameras. The system downsamples the high-resolution images and generates a reference disparity map based on the downsampled images. The system partitions the high-resolution images into overlapping sub-images based on the reference disparity map, and generates disparity maps for stereo sub-images. The system merges the reference disparity map and the sub-image disparity maps to produce a disparity map for the high-resolution images.
A downsample circuit 204 of the image processor 102 downsamples the padded images to produce images of a size suitable for disparity processing by a stereo disparity engine (SDE) (e.g., a hardware accelerator) of the image processor 102. For example, the high-resolution images padded in the pad circuit 202 may be too large to be processed by the stereo disparity engine, and the downsampling may produce images (two-dimensional arrays) having a resolution (e.g., 960×544) compatible with the SDE. The padded images may be downsampled by a factor of four in some implementations. Other downsampling factors may be used consistent with the teachings herein.
In block 206, the SDE processes the downsampled images produced by the downsample circuit 204 to generate a disparity map (a reference disparity map (Dds))) for the downsampled images. The reference disparity map specifies the distance between pixels in the downsampled images.
A disparity analysis 210 is performed on reference disparity map (Dds) using a partitioning circuit 208 of the image processor 102. The partitioning circuit 208 analyzes the reference disparity map and partitions (subdivides) the high-resolution stereo images (e.g., the padded stereo images) into smaller overlapping sub-images (sub-arrays). For example, the image processor 102 may partition the high-resolution images into N (e.g., 12 sub-images). Adjacent sub-images derived from a high-resolution image may overlap horizontally and vertically. In one example, a sub-image may be 2048×320 pixels in size, and overlap neighboring sub-images by 8 pixels vertically and 196 pixels horizontally (where the SDE disparity search range is 192). The sub-image size and overlap may differ in other examples.
One of the high-resolution images may be referred to as a reference image and the other of the high-resolution images may be referred to as a target image.
For each sub-image of the reference image 212, the image processor 102 extracts a sub-image from the target image 214. The sub-image of the target image may be shifted relative to the sub-image of the reference image. For example, a sub-image of the target image 214 may be taken from the same rows as the sub-image of the reference image 212, but from different columns.
In block 216, the SDE generates a disparity map for each pair of sub-images (reference image sub-image(i) and target image sub-image(i)). The disparity maps relate the pixels of each sub-image of the reference image 212 to the pixels of a corresponding sub-image of the target image 214.
In a merge circuit 218, the image processor 102 merges (combines) the reference disparity map and the sub-image disparity maps to produce a final disparity map for the high-resolution stereo images. In some examples, the image processor 102 combines the disparities as an average of the sub-image disparity and the scaled reference disparity. For example, the image processor 102 may generate a pixel of the final disparity map as:
If the if the difference of the sub-image disparity and the reference disparity for a pixel exceeds the threshold value (Th), then the disparity value for the pixel may be set to zero or another selected (e.g., constant) value.
In block 302, the image processor 102 receives full-resolution (high-resolution) stereo images from first and second cameras, e.g., the camera 104 and the camera 106, and the image processor 102 pads the images by adding pixels to the edges of the images to increase the image size to a multiple of a selected value (e.g., a multiple of 64). For example, if the received images are 3840×2160 pixels, the padding may increase the size of each image to 3840×2176 in some implementations.
In block 304, the image processor 102 downsamples the padded images to produce images of a size suitable for disparity processing by a stereo disparity engine (SDE) (a hardware accelerator) of the image processor 102. For example, the padded images produced in block 302 may be too large to be processed by the stereo disparity engine of the image processor 102, and the downsampling may produce images having a resolution compatible with the SDE. The padded images may be downsampled by a factor of four in some implementations, so 960×544 images may be generated from 3840×2176 images.
In block 306, the SDE of the image processor 102 processes the downsampled images produced in block 304 to generate the reference disparity map (Dds))) for the downsampled images. The reference disparity map specifies the distance between pixels in the downsampled images.
In block 308, the image processor 102 analyzes the reference disparity map and partitions (subdivides) the high-resolution stereo images (e.g., the padded reference and target images) into smaller overlapping sub-images (sub-arrays) as illustrated in
For each sub-image i=0, . . . , N−1, the image processor 102 determines the minimum and maximum disparity [min_disparity(i), max_disparity(i)] from the reference disparity map. The image processor 102 builds a disparity histogram from the pixels in each sub-image. Bins of the histogram having less than 1% of the number of pixels in the sub-image may be suppressed. The image processor 102 may determine the maximum and minimum disparity values for a sub-image, such that the difference of the maximum and minimum disparities is not greater than the SDE search range divided by the downsampling factor. For example, max_disparity(i)−min_disparity(i)≤192/scale, where scale is the downsampling factor. The image processor 102 may also determine the maximum and minimum disparity values for a sub-image, such that the number of pixels in the sub-image is maximized.
For each sub-image of the reference image 212, the image processor 102 extracts a sub-image from the target image. The sub-image of the target image 214 may be shifted relative to the sub-image of the reference image. For example, a sub-image of the target image may be taken from the same rows as the sub-image of the reference image, but from different columns (shifted as shown in
In block 310, the SDE of the image processor 102 generates a disparity map for each pair of sub-images (reference image sub-image(i) and target image sub-image(i)) produced in block 308. The disparity maps relate the pixels of each sub-image of the reference image 212 to the pixels of a corresponding sub-image of the target image 214.
In block 312, the image processor 102 merges (combines) the reference disparity map produced in block 306 and the sub-image disparity maps generated in block 310 to produce a final disparity map for the high-resolution stereo images. The image processor 102 may combine the disparities as an average of the sub-image disparity and the scaled reference disparity. If the difference of the sub-image disparity and the reference disparity for a pixel exceeds a threshold value, then the disparity value for the pixel may be set to zero or another selected value.
The processor 402 and the SDE 404 are coupled to the memory 406. The memory 406 is a non-transitory computer-readable medium and may include volatile and/or non-volatile memory, such as static random-access memory, dynamic random-access memory, FLASH memory, and/or other types of memory. The memory 406 stores data processed by the processor 402 and the SDE 404 and stores instructions executed by the processor 402. The data stored in the memory 406 includes the full-resolution (high-resolution) images 408 received from cameras, e.g., the camera 104 and the camera 106, the downsampled images 410, the reference disparity map 414, the sub-images 412, the sub-image disparity maps 416, and the final disparity map 418.
The instructions stored in the memory 406 include padding instructions 420, downsampling instructions 422, partitioning instructions 424, and merging instructions 426. The full-resolution images 408 may be moved into the memory 406 using direct memory access or through the processor 402. The processor 402 may execute the padding instructions 420 to implement the pad circuit 202 that pads the full-resolution images 408. The processor 402 may execute the downsampling instructions 422 to implement the downsample circuit 204 that downsamples (e.g., by four) the padded images to generate the downsampled images 410. The SDE 404 processes the downsampled images 410 to generate the reference disparity map 414.
The processor 402 executes the partitioning instructions 424 to implement the partitioning circuit 208 that processes the reference disparity map 414 and the full-resolution images 408 as described herein to generate the sub-images 412. The SDE 404 processes the sub-images 412 to generate the sub-image disparity maps 416. The processor 402 executes the merging instructions 426 to implement the merge circuit 218 that combines the sub-image disparity maps 416 and the reference disparity map 414 as described herein to produce the final disparity map 418. The final disparity map 418 may be further processed by the processor 402 or provided (by the processor 402 or other circuit) to a system that determines a distance from the camera 104 and the camera 106 to an object in the stereo images. In the vehicle 100, an operator alert (e.g., an audio, video, or tactile alert) may be generated or the operation of the vehicle 100 may be controlled (e.g., braking activated) based on the final disparity map 418 and distance to an object determined based on the final disparity map 418.
As the foregoing describes, the SDE, e.g., SDE 404, is configured to provide high-performance stereo vision, which may be incorporated in advanced driver assistance systems. In some implementations, an inexpensive, low-performance hardware accelerator may be configured to provide the functionality of the SDE described herein to provide a compact, high performance, low cost stereo vision solution.
The same reference numbers or other reference designators are used in the drawings to designate the same or similar (either by function and/or structure) features.
In this description, the term “couple” may cover connections, communications, or signal paths that enable a functional relationship consistent with this description. For example, if device A generates a signal to control device B to perform an action: (a) in a first example, device A is coupled to device B by direct connection; or (b) in a second example, device A is coupled to device B through intervening component C if intervening component C does not alter the functional relationship between device A and device B, such that device B is controlled by device A via the control signal generated by device A.
Also, in this description, the recitation “based on” means “based at least in part on.” Therefore, if X is based on Y, then X may be a function of Y and any number of other factors.
A device that is “configured to” perform a task or function may be configured (e.g., programmed and/or hardwired) at a time of manufacturing by a manufacturer to perform the function and/or may be configurable (or reconfigurable) by a user after manufacturing to perform the function and/or other additional or alternative functions. The configuring may be through firmware and/or software programming of the device, through a construction and/or layout of hardware components and interconnections of the device, or a combination thereof.
A circuit or device that is described herein as including certain components may instead be adapted to be coupled to those components to form the described circuitry or device. For example, a structure described as including one or more active or passive elements or subsystems may instead include only a subset of the elements and may be adapted to be coupled to at least some of the elements to form the described structure either at a time of manufacture or after a time of manufacture, for example, by an end-user and/or a third-party.
Circuits described herein are reconfigurable to include additional or different components to provide functionality at least partially similar to functionality available prior to the component replacement.
While certain elements of the described examples are included in an integrated circuit and other elements are external to the integrated circuit, in other example embodiments, additional or fewer features may be incorporated into the integrated circuit. In addition, some or all of the features illustrated as being external to the integrated circuit may be included in the integrated circuit and/or some features illustrated as being internal to the integrated circuit may be incorporated outside of the integrated. As used herein, the term “integrated circuit” means one or more circuits that are: (i) incorporated in/over a semiconductor substrate; (ii) incorporated in a single semiconductor package; (iii) incorporated into the same module; and/or (iv) incorporated in/on the same printed circuit board.
Modifications are possible in the described embodiments, and other embodiments are possible, within the scope of the claims.