The present invention relates to an information processing device, an imaging device, an equipment control system, a mobile object, an information processing method, and a computer-readable recording medium.
In the related art, a body structure of an automobile and the like have been developed in view of how to save a pedestrian or how to protect an occupant in a case in which the pedestrian collides with the automobile from the viewpoint of safety of automobiles. However, in recent years, an information processing technique and an image processing technique have been developed, so that a technique of rapidly detecting a person and an automobile has been developed. By applying these techniques, there has been already developed an automobile that prevents collision by automatically braking before the automobile collides with an object. To automatically control the automobile, a distance to an object such as a person or another car needs to be precisely measured. Due to this, distance measurement using a millimetric wave radar and a laser radar, a distance measurement using a stereo camera, and the like are put to practical use.
When a stereo camera is used as a technique of recognizing the object, a parallax image is generated based on a parallax of each object projected in a taken luminance image, and the object is recognized by integrating pixel groups having similar parallax values.
Patent Literature 1 discloses, for a technique of detecting an object using a distance image generated through stereo image processing, a technique of suppressing erroneous detection such that, when a group of the same objects is present among a plurality of detected objects, the same objects are erroneously regarded as a plurality of divided small objects (for example, two pedestrians) although the same objects should be regarded as one object and detected as a single object (for example, one preceding vehicle).
However, in the related art for detecting an object such as a vehicle or a pedestrian from a parallax image taken by a stereo camera, for example, the object such as a vehicle and another object adjacent to the former object may be detected as one object.
In view of the above-described conventional problem, there is a need to provide a technique for improving performance of recognizing an object.
According to exemplary embodiments of the present invention, there is provided an information processing device comprising: a first generation unit configured to generate first information in which a horizontal direction position and a depth direction position of an object are associated with each other from information in which a vertical direction position, the horizontal direction position, and the depth direction position of the object are associated with each other; a first detection unit configured to detect one region indicating the object based on the first information; a second generation unit configured to generate, from the information in which the vertical direction position, the horizontal direction position, and the depth direction position of the object are associated with each other, second information having separation performance higher than separation performance of the first information in which the horizontal direction position and the depth direction position of the object are associated with each other; a second detection unit configured to detect a plurality of regions indicating objects based on the second information; and an output unit configured to associate the one region detected based on the first information with the regions detected based on the second information, and to output the one region and the regions that are associated with each other.
According to the disclosed technique, performance of recognizing an object can be improved.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the present invention.
As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. In describing preferred embodiments illustrated in the drawings, specific terminology may be employed for the sake of clarity. However, the disclosure of this patent specification is not intended to be limited to the specific terminology so selected, and it is to be understood that each specific element includes all technical equivalents that have the same function, operate in a similar manner, and achieve a similar result.
The following specifically describes embodiments with reference to the drawings. Herein, exemplified is a case in which an object recognition device 1 is mounted on an automobile.
As illustrated in
The object recognition device 1 has an imaging function for imaging a traveling direction of the vehicle 70, and is installed on an inner side of a front window in the vicinity of a rearview mirror of the vehicle 70, for example. Details about a configuration and an operation of the object recognition device 1 will be described later. The object recognition device 1 includes a main body unit 2, and an imaging unit 10a and an imaging unit 10b fixed to the main body unit 2. The imaging units 10a and 10b are fixed to the main body unit 2 so as to take an image of a subject in the traveling direction of the vehicle 70.
The vehicle control device 6 is an electronic control unit (ECU) that executes various vehicle control based on recognition information received from the object recognition device 1. As an example of vehicle control, the vehicle control device 6 executes steering control for controlling a steering system (control object) including the steering wheel 7 to avoid an obstacle, brake control for controlling the brake pedal 8 (control object) to decelerate and stop the vehicle 70, or the like based on the recognition information received from the object recognition device 1.
As in the equipment control system 60 including the object recognition device 1 and the vehicle control device 6, safety in driving of the vehicle 70 can be improved by executing vehicle control such as steering control or brake control.
As described above, the object recognition device 1 is assumed to take an image of the front of the vehicle 70, but the embodiment is not limited thereto. That is, the object recognition device 1 may be installed to take an image of the back or a side of the vehicle 70. In this case, the object recognition device 1 can detect positions of a following vehicle and person in the rear of the vehicle 70, another vehicle and person on a side of the vehicle 70, or the like. The vehicle control device 6 can detect danger at the time when the vehicle 70 changes lanes or when the vehicle 70 joins in a lane, and execute vehicle control as described above. When determining that there is a risk of collision based on the recognition information about the obstacle in the rear of the vehicle 70 output from the object recognition device 1 in a reversing operation at the time of parking the vehicle 70 and the like, the vehicle control device 6 can execute vehicle control as described above.
As illustrated in
The parallax value deriving unit 3 derives a parallax value dp indicating a parallax for an object E from a plurality of images obtained by imaging the object E, and outputs a parallax image indicating the parallax value dp for each pixel (an example of “measurement information in which a position in a vertical direction of a detecting target, a position in a horizontal direction thereof, and a position in a depth direction thereof are associated with each other”). The recognition processing unit 5 performs object recognition processing and the like on an object such as a person and a vehicle projected in a taken image based on the parallax image output from the parallax value deriving unit 3, and outputs, to the vehicle control device 6, recognition information as information indicating a result of object recognition processing.
As illustrated in
The imaging unit 10a is a processing unit that images a forward subject and generates an analog image signal. The imaging unit 10a includes an imaging lens 11a, a diaphragm 12a, and an image sensor 13a.
The imaging lens 11a is an optical element for refracting incident light to form an image of the object on the image sensor 13a. The diaphragm 12a is a member that adjusts a quantity of light input to the image sensor 13a by blocking part of light passed through the imaging lens 11a. The image sensor 13a is a semiconductor element that converts light entering the imaging lens 11a and passing through the diaphragm 12a into an electrical analog image signal. For example, the image sensor 13a is implemented by a solid imaging element such as a charge coupled device (CCD) or a complementary metal oxide semiconductor (CMOS).
The imaging unit 10b is a processing unit that images a forward subject and generates an analog image signal. The imaging unit 10b includes an imaging lens 11b, a diaphragm 12b, and an image sensor 13b. Functions of the imaging lens 11b, the diaphragm 12b, and the image sensor 13b are the same as the functions of the imaging lens 11a, the diaphragm 12a, and the image sensor 13a described above, respectively. The imaging lens 11a and the imaging lens 11b are installed such that lens surfaces thereof are positioned on the same plane so that the left and right cameras can take an image under the same condition.
The signal conversion unit 20a is a processing unit that converts the analog image signal generated by the imaging unit 10a into digital image data. The signal conversion unit 20a includes a correlated double sampling (CDS) 21a, an auto gain control (AGC) 22a, an analog digital converter (ADC) 23a, and a frame memory 24a.
The CDS 21a removes noise from the analog image signal generated by the image sensor 13a through correlated double sampling, a differential filter in the horizontal direction, a smoothing filter in the vertical direction, or the like. The AGC 22a performs gain control for controlling strength of the analog image signal from which the noise is removed by the CDS 21a. The ADC 23a converts the analog image signal on which gain control is performed by the AGC 22a into digital image data. The frame memory 24a stores the image data converted by the ADC 23a.
The signal conversion unit 20b is a processing unit that converts the analog image signal generated by the imaging unit 10b into digital image data. The signal conversion unit 20b includes a CDS 21b, an AGC 22b, an ADC 23b, and a frame memory 24b. Functions of the CDS 21b, the AGC 22b, the ADC 23b, and the frame memory 24b are the same as the functions of the CDS 21a, the AGC 22a, the ADC 23a, and the frame memory 24a described above, respectively.
The image processing unit 30 is a device that performs image processing on the image data converted by the signal conversion unit 20a and the signal conversion unit 20b. The image processing unit 30 includes a field programmable gate array (FPGA) 31, a central processing unit (CPU) 32, a read only memory (ROM) 33, a random access memory (RAM) 34, an interface (I/F) 35, and a bus line 39.
The FPGA 31 is an integrated circuit, and herein performs processing of deriving the parallax value dp for an image based on the image data. The CPU 32 controls each function of the parallax value deriving unit 3. The ROM 33 stores a computer program for image processing executed by the CPU 32 for controlling each function of the parallax value deriving unit 3. The RAM 34 is used as a work area of the CPU 32. The I/F 35 is an interface for communicating with an I/F 55 of the recognition processing unit 5 via a communication line 4. As illustrated in
The image processing unit 30 is assumed to include the FPGA 31 as an integrated circuit for deriving the parallax value dp, but the embodiment is not limited thereto. The integrated circuit may be an application specific integrated circuit (ASIC) and the like.
As illustrated in
The FPGA 51 is an integrated circuit, and herein performs object recognition processing on the object based on the parallax image received from the image processing unit 30. The CPU 52 controls each function of the recognition processing unit 5. The ROM 53 stores a computer program for object recognition processing executed by the CPU 52 for performing object recognition processing of the recognition processing unit 5. The RAM 54 is used as a work area of the CPU 52. The I/F 55 is an interface for performing data communication with the I/F 35 of the image processing unit 30 via the communication line 4. The CAN I/F 58 is an interface for communicating with an external controller (for example, the vehicle control device 6 illustrated in
With such a configuration, when the parallax image is transmitted from the I/F 35 of the image processing unit 30 to the recognition processing unit 5 via the communication line 4, the FPGA 51 performs object recognition processing and the like for the object such as a person and a vehicle projected in the taken image based on the parallax image in accordance with a command from the CPU 52 of the recognition processing unit 5.
Each computer program described above may be recorded and distributed in a computer-readable recording medium as an installable or executable file. Examples of the recording medium include a compact disc read only memory (CD-ROM) or a secure digital (SD) memory card.
As described above with reference to
At least some of the functional units of the object recognition device 1 may be implemented by the FPGA 31 or the FPGA 51, or may be implemented when a computer program is executed by the CPU 32 or the CPU 52.
The image acquisition unit 100a and the image acquisition unit 100b are functional units that obtain a luminance image from images taken by the right camera (imaging unit 10a) and the left camera (imaging unit 10b), respectively.
The conversion unit 200a is a functional unit that removes noise from image data of the luminance image obtained by the image acquisition unit 100a and converts the image data into digital image data to be output. The conversion unit 200a may be implemented by the signal conversion unit 20a illustrated in
The conversion unit 200b is a functional unit that removes noise from image data of the luminance image obtained by the image acquisition unit 100b and converts the image data into digital image data to be output. The conversion unit 200b may be implemented by the signal conversion unit 20b illustrated in
Regarding the image data of the two luminance images output by the conversion units 200a and 200b (hereinafter, simply referred to as a luminance image), the luminance image taken by the image acquisition unit 100a serving as the right camera (imaging unit 10a) is assumed to be image data of a reference image Ia (hereinafter, simply referred to as a reference image Ia), and the luminance image taken by the image acquisition unit 100b serving as the left camera (imaging unit 10b) is assumed to be image data of a comparative image Ib (hereinafter, simply referred to as a comparative image Ib). That is, the conversion units 200a and 200b output the reference image Ia and the comparative image Ib, respectively, based on the two luminance images output from the image acquisition units 100a and 100b.
The parallax value arithmetic processing unit 300 derives the parallax value for each pixel of the reference image Ia based on the reference image Ia and the comparative image Ib received from the conversion units 200a and 200b, and generates a parallax image in which each pixel of the reference image Ia is associated with the parallax value.
As illustrated in
The second generation unit 500 is a functional unit that receives the parallax image input from the parallax value arithmetic processing unit 300, receives the reference image Ia input from the parallax value deriving unit 3, and generates a V-Disparity map, a U-Disparity map, a Real U-Disparity map, and the like. The V-Disparity map is an example of “information in which a position in the vertical direction is associated with a position in the depth direction”. The U-Disparity map and the Real U-Disparity map are examples of “information in which a position in the horizontal direction is associated with a position in the depth direction”.
As illustrated in
The third generation unit 501 is a functional unit that generates a Vmap VM as the V-Disparity map illustrated in
The third generation unit 501 makes linear approximation of a position estimated to be the road surface from the generated Vmap VM. Approximation can be made with one straight line when the road surface is flat, but when an inclination of the road surface is variable, linear approximation needs to be accurately made by dividing a section in the Vmap VM. As linear approximation, Hough transform, a method of least squares, or the like as a well-known technique can be utilized. In the Vmap VM, the utility pole part 601a and the car part 602a as clusters positioned above the detected road surface part 600a correspond to the utility pole 601 and the car 602 as objects on the road surface 600, respectively. When the U-Disparity map is generated by the fourth generation unit 502 described later, only information about a part positioned above the road surface is used for removing noise. If the road surface is estimated, the height of the road surface is found, so that the height of the object can be found. This process is performed by using a well-known method. For example, a linear expression representing the road surface is obtained, so that a corresponding y-coordinate y0 where the parallax value dp=0 is determined, and the coordinate y0 is the height of the road surface. For example, when the parallax value is dp and the y-coordinate is y′, y′−y0 indicates the height from the road surface in a case of the parallax value d. A height H from the road surface at the coordinates (dp, y′) described above can be obtained through an arithmetic expression of H=(z×(y′−y0))/f. In this case, “z” in the arithmetic expression is a distance calculated from the parallax value dp (z=BF/(d−offset)), and “f” is a value obtained by converting a focal length of the imaging units 10a and 10b into the same unit as a unit of (y′−y0). In this case, BF is a value obtained by multiplying a base length B by a focal length f of the imaging units 10a and 10b, and offset is a parallax in a case of photographing an infinite object.
The fourth generation unit 502 is a functional unit that generates a Umap UM (second frequency image) as the U-Disparity map illustrated in
The fourth generation unit 502 generates a height Umap UM_H as an example of the U-Disparity map illustrated in
The fifth generation unit 503 generates, from the height Umap UM_H generated by the fourth generation unit 502, a real height Umap RM_H as an example of the Real UDisparity map illustrated in
The fifth generation unit 503 also generates, from the Umap UM generated by the fourth generation unit 502, a real Umap RM as an example of the Real U-Disparity map illustrated in
Herein, each of the real height Umap RM_H and the real Umap RM is a two-dimensional histogram assuming that the horizontal axis indicates an actual distance in a direction (horizontal direction) from the imaging unit 10b (left camera) to the imaging unit 10a (right camera), and the vertical axis indicates the parallax value dp of the parallax image (or a distance in the depth direction converted from the parallax value dp). The left guardrail part 611b in the real height Umap RM_H illustrated in
Specifically, in the height Umap UM_H and the Umap UM, the fifth generation unit 503 generates the real height Umap RM_H and the real Umap RM corresponding to an overhead view by not performing thinning out when the object is at a distant place (the parallax value dp is small) because the object is small and an amount of parallax information and resolution of distance are small, and by largely thinning out pixels when the object is at a short-distance place because the object is projected to be large and the amount of parallax information and the resolution of distance are large. As described later, a cluster (object region) of pixel values can be extracted from the real height Umap RM_H or the real Umap RM. In this case, the width of a rectangle surrounding the cluster corresponds to the width of the extracted object, and the height thereof corresponds to the depth of the extracted object. The fifth generation unit 503 does not necessarily generate the real height Umap RM_H from the height Umap UM_H. Alternatively, the fifth generation unit 503 can generate the real height Umap RM_H directly from the parallax image.
The second generation unit 500 can specify the position in the X-axis direction and the width (xmin, xmax) in the parallax image and the reference image Ia of the object from the generated height Umap UM_H or real height Umap RM_H. The second generation unit 500 can specify an actual depth of the object from information of the height of the object (dmin, dmax) in the generated height Umap UM_H or real height Umap RM_H. The second generation unit 500 can specify, from the generated Vmap VM, the position in the y-axis direction and the height (ymin=“y-coordinate corresponding to the maximum height from the road surface having a maximum parallax value”, ymax=“y-coordinate indicating the height of the road surface obtained from the maximum parallax value”) in the parallax image and the reference image Ia of the object. The second generation unit 500 can also specify an actual size in the x-axis direction and the y-axis direction of the object from the width in the x-axis direction (xmin, xmax) and the height in the y-axis direction (ymin, ymax) of the object specified in the parallax image, and the parallax value dp corresponding thereto. As described above, the second generation unit 500 can specify the position of the object in the reference image Ia and the actual width, height, and depth thereof by utilizing the Vmap VM, the height Umap UM_H, and the real height Umap RM_H. The position of the object in the reference image Ia is specified, so that the position thereof in the parallax image is also determined, and the second generation unit 500 can specify the distance to the object.
The clustering processing unit 510 illustrated in
The basic detection unit 511 performs basic detection processing for detecting the depth, the width, and the like of the object such as a vehicle based on the Real UDisparity map as a high-resolution map. The following describes an example in which the basic detection unit 511 performs detection using the Real U-Disparity map. Alternatively, the basic detection unit 511 may perform detection using the U-Disparity map. In this case, for example, the basic detection unit 511 may perform processing of converting the x-coordinate in the U-Disparity map into an actual distance and the like in the lateral direction (horizontal direction). In the basic detection processing, if the road surface that is estimated based on the Vmap VM is lower than an actual road surface, for example, detection accuracy for the object region is deteriorated.
The separation detection unit 512 performs separation detection processing for detecting the depth, the width, and the like of the object such as a vehicle using, as an example of a high position map, a map using a parallax point of which the height from the road surface is equal to or larger than a predetermined value (“second height”) among parallax points included in the Real U-Disparity map. In a case in which the height of the object is relatively low, the separation detection unit 512 may separate the same object into a plurality of object regions to be detected in some cases.
The integration detection unit 513 uses, as an example of a low-resolution map, a small real Umap obtained by reducing the Real U-Disparity map by thinning out the pixels, for example, to perform integration detection processing for detecting the depth, the width, and the like of the object such as a vehicle. The number of pixels in the small real Umap is smaller than that of the real Umap, so that resolution of the small real Umap is assumed to be low. The integration detection unit 513 may perform detection using a map obtained by reducing the U-Disparity map. The integration detection unit 513 uses the small real Umap of which the resolution is relatively low, so that the integration detection unit 513 may detect a plurality of objects as the same object in some cases.
In this way, detection performance for the object can be improved by basically using the high-resolution map for object detection, and also using the high position map having higher separation performance and the low-resolution map that can integrally detect the same object.
The selection unit 514 selects an object not to be rejected from among the objects detected by the basic detection unit 511, the separation detection unit 512, and the integration detection unit 513. Herein, rejection means processing of excluding the object from processing at a later stage (tracking processing and the like).
The frame creation unit 515 creates a frame (detection frame) in a region (recognition region) in a parallax image Ip (or the reference image Ia) corresponding to a region of the object selected by the selection unit 514. Herein, the frame means information of a rectangle surrounding the object as information indicating the position and the size of the object, for example, information of coordinates of corners of the rectangle and the height and the width of the rectangle.
The background detection unit 516 detects, in the detection frame created by the frame creation unit 515, a background of the object corresponding to the detection frame.
The rejection unit 517 rejects the object corresponding to the detection frame in which a background satisfying a predetermined condition is detected by the background detection unit 516. Background detection and rejection based thereon are preferably performed, but are not necessarily performed.
The tracking unit 530 is a functional unit that executes tracking processing as processing of tracking the object based on recognition region information as information about the object recognized by the clustering processing unit 510. Herein, the recognition region information means information about the object recognized by the clustering processing unit 510, and includes information such as the position and the size of the recognized object in the V-Disparity map, the U-Disparity map, and the Real U-Disparity map, an identification number of labeling processing described later, and a rejection flag, for example.
Next, the following describes processing performed by the clustering processing unit 510 with reference to
At Step S11, the basic detection unit 511 of the clustering processing unit 510 performs basic detection processing for detecting a region of the object from the real Umap RM. In the basic detection processing, a cluster of parallax points on the real Umap RM is detected.
In the real Umap RM, the number of pixels is relatively large, so that the resolution of distance is relatively high, and parallax information of the object positioned above the road surface is utilized. Thus, in the basic detection processing, the object region is detected with relatively stable accuracy. However, when the road surface that is estimated based on the Vmap VM is lower than an actual road surface, or when the number of parallax points of the object as a detection target is small, for example, detection accuracy for the object region is deteriorated. Details about the basic detection processing will be described later.
Subsequently, the separation detection unit 512 of the clustering processing unit 510 performs separation detection processing for detecting a region of the object using a parallax point of which the height from the road surface is equal to or larger than a predetermined value among parallax points included in the real Umap RM (Step S12). In the separation detection processing, a cluster of parallax points of which the height from the road surface is equal to or larger than the predetermined value is detected from among the parallax points included in the real Umap RM. Thus, even when a plurality of objects of which the height is relatively high are adjacent to each other, an object region obtained by correctly separating the objects from each other can be detected because they are not influenced by an object of which the height from the road surface is relatively low. However, when the object has a relatively low height, the same object may be detected being separated into a plurality of object regions in some cases. Details about the separation detection processing will be described later.
Subsequently, the integration detection unit 513 of the clustering processing unit 510 performs integration detection processing for detecting the region of the object using the small real Umap as an image obtained by thinning out the pixels from the real Umap RM (Step S13). The small real Umap may be created by thinning out the pixels from the real Umap RM so that the width of one pixel corresponds to about 10 cm, for example. In thinning out the pixels, the pixel may be simply extracted from the real Umap RM, or a value of the pixel in the small real Umap may be determined based on a value of a pixel within a predetermined range from the pixel extracted from the real Umap RM. In the integration detection processing, in a case of an object of which the number of parallax points is small, the same object is relatively hardly detected being separated into a plurality of object regions. However, the resolution of distance is relatively low, so that a plurality of objects adjacent to each other may be detected as the same object, for example. Details about the integration detection processing will be described later.
The basic detection processing, the separation detection processing, and the integration detection processing described above may be performed in any order, or may be performed in parallel.
Subsequently, the selection unit 514 of the clustering processing unit 510 selects the object region to be output to the frame creation unit 515 from among object regions detected through the “basic detection processing”, the “separation detection processing”, and the “integration detection processing” described above (Step S14). Details about processing of selecting the object region to be output to the frame creation unit 515 will be described later.
Subsequently, the frame creation unit 515 of the clustering processing unit 510 creates a detection frame corresponding to the object region selected by the selection unit 514 (Step S15).
Subsequently, the background detection unit 516 of the clustering processing unit 510 detects a background in a detection frame corresponding to the object region detected through the “integration detection processing” among created detection frames (Step S16). Details about the processing of detecting the background in the detection frame will be described later.
Subsequently, the rejection unit 517 of the clustering processing unit 510 performs rejection processing (Step S17). Details about the rejection processing will be described later.
Next, with reference to
At Step S201, the basic detection unit 511 performs 8-neighbor labeling processing for giving the same ID to pixels that are continuous in a vertical, horizontal, or oblique direction for a parallax point as a pixel having a pixel value (frequency of the parallax) equal to or larger than a predetermined value in the real Umap RM. Well-known labeling processing can be utilized.
Subsequently, the basic detection unit 511 sets a rectangle circumscribing each pixel group (each isolated region) to which the same ID is given (Step S202).
Subsequently, the basic detection unit 511 rejects the rectangle having a size equal to or smaller than a predetermined value (Step S203). This is because the rectangle having a size equal to or smaller than the predetermined value can be determined to be noise. The basic detection unit 511 may also reject a rectangle having an average value of the pixel value (frequency of the parallax) in an area of the real Umap RM with respect to an area of each rectangle is smaller than the predetermined value, for example.
Accordingly, the rectangle circumscribing each isolated region is detected as the object region.
In the basic detection processing, it is sufficient that the region indicating the object is detected based on the parallax image. The basic detection processing may be performed using a well-known technique.
Next, the following describes the separation detection processing at Step S12 performed by the separation detection unit 512. The separation detection processing is significantly different from the “basic detection processing” described above in that used is the parallax point of which the height from the road surface is equal to or larger than the predetermined value among the parallax points included in the real Umap RM instead of using all parallax points included in the real Umap RM. Other points may be the same as those of the “basic detection processing” described above. In performing the 8-neighbor labeling processing at Step S201 in the “separation detection processing”, a break of the parallax point equal to or smaller than a predetermined value (for example, corresponding to one pixel) in the horizontal direction in the real Umap RM is possibly caused by noise, so that the parallax point may be regarded to be continuous.
Next, with reference to
At Step S301, the integration detection unit 513 performs 4-neighbor labeling processing for giving the same ID to pixels (parallax points) that are continuous in the vertical direction (depth direction) or the lateral direction (horizontal direction) on the small real Umap. In the above processing, the 8-neighbor labeling processing may be used.
Subsequently, the integration detection unit 513 sets a rectangle circumscribing each pixel group (each isolated region) to which the same ID is given (Step S302).
Subsequently, the integration detection unit 513 extracts the object such as a vehicle (Step S303). The integration detection unit 513 extracts the region of the object such as a vehicle based on the width, the depth, frequency of the parallax, and the like of each isolated region. Accordingly, the rectangle circumscribing each isolated region is detected as the object region.
Next, with reference to
At Step S401, the selection unit 514 rejects an object region not present on a lane on which a host vehicle is traveling among the object regions detected through the integration detection processing. For example, when the position of the object region is outside a predetermined range from a forward direction of the host vehicle, the selection unit 514 rejects the object region. Accordingly, for an object that may hamper the host vehicle from traveling, the object region detected through the integration detection processing is output.
At a distant place the distance of which from the host vehicle is relatively long, accuracy in detecting the position of the object region is deteriorated. Thus, the predetermined range may be set to be relatively wide corresponding to the distance from the host vehicle.
Subsequently, the selection unit 514 determines whether the object region detected through the integration detection processing is overlapped with one object region detected through the basic detection processing in a certain degree in the real Umap RM (Step S402). For example, if a value obtained by dividing an area of a region in which the object region detected through the integration detection processing is overlapped with the object region detected through the basic detection processing in the real Umap RM by an area of the object region detected through the basic detection processing is equal to or larger than a predetermined threshold, it is determined that they are overlapped with each other in a certain degree.
If the object regions are overlapped with each other in a certain degree (YES at Step S402), the selection unit 514 determines whether the size of the object region as a result of the integration detection processing is smaller than the object region as a result of the basic detection processing (Step S403). If the size is determined to be smaller (YES at Step S403), the object region detected through the basic detection processing and the object region detected through the separation detection processing are output to the frame creation unit 515 (Step S404), and the process is ended. That is, a result of the basic detection processing as an inclusive detection result and a result of the separation detection processing as a partial detection result are output while being associated with each other as information indicating the same object. This is because there is a high possibility that the result of the integration detection processing is erroneous when the size of the object region as the result of the integration detection processing is smaller than that of the object region as the result of the basic detection processing, so that the result of the basic detection processing is considered to be most reliable as information indicating one object, and the result of the separation detection processing is considered to be most reliable as information indicating a plurality of objects.
If the size is determined not to be smaller (NO at Step S403), the selection unit 514 determines whether a plurality of object regions detected through the separation detection processing are present in the one object region detected through the basic detection processing (Step S405).
If a plurality of object regions are present (YES at Step S405), the selection unit 514 outputs the object region detected through the integration detection processing and the object regions detected through the separation detection processing to the frame creation unit 515 (Step S406), and the process is ended. That is, the result of the integration detection processing as an inclusive detection result and the result of the separation detection processing as a partial detection result are output while being associated with each other as information indicating the same object. This is because the result of the integration detection processing is considered to be most reliable as information indicating one object, and the result of the separation detection processing is considered to be most reliable as information indicating a plurality of objects when there are a plurality of object regions detected through the separation detection processing in one object region detected through the basic detection processing.
If a plurality of object regions are not present (NO at Step S405), the selection unit 514 outputs the object region detected through the integration detection processing and the one object region detected through the basic detection processing to the frame creation unit 515 (Step S407), and the process is ended. That is, the result of the integration detection processing as an inclusive detection result and the result of the basic detection processing as a partial detection result are output while being associated with each other as information indicating the same object. This is because the result of the basic detection processing and the result of the separation detection processing can be equally treated when a plurality of object regions detected through the separation detection processing are not present in one object region detected through the basic detection processing, so that the result of the integration detection processing is considered to be most reliable as information indicating one object, and the result of the basic detection processing is considered to be most reliable as information indicating a plurality of objects.
If the object regions are not overlapped with each other in a certain degree (NO at Step S402), the selection unit 514 outputs only the object region detected through the integration detection processing to the frame creation unit 515 (Step S408), and the process is ended. That is, the result of the integration detection processing as an inclusive detection result and a result indicating that the object region is not detected as a partial detection result are output while being associated with each other as information indicating the same object. This is because the result of the integration detection processing that is hardly influenced by noise is considered to be most reliable as information indicating a rough position of the object when the object region detected through the integration detection processing is not overlapped with one object region detected through the basic detection processing in a certain degree.
The processing subsequent to Step S402 is executed for each object region detected through the integration detection processing.
As described above, respective detection processing results are simply compared and associated with each other to be output, so that a highly accurate detection result can be output in a relatively short time.
Next, with reference to
At Step S501, the background detection unit 516 calculates a range on the real Umap RM corresponding to the detection frame created in the parallax image Ip. When the detection frame is positioned in the vicinity of a straight advancing direction of the host vehicle, the range may be a range between a left end of the coordinate in the horizontal direction in the real Umap RM of the object region corresponding to the detection frame and a right end of the coordinate in the horizontal direction of the object region. Alternatively, for example, the range may be a range between two different straight lines connecting the center of the imaging unit 10a and the imaging unit 10b and the parallax point of the object region on the real Umap RM corresponding to the detection frame, that is, a first straight line having the largest angle with respect to the horizontal direction and a second straight line having the smallest angle with respect to the horizontal direction.
Subsequently, the background detection unit 516 creates a histogram (hereinafter, referred to as an “object parallax histogram”) indicating a total value of parallax frequency of the parallax points of the object region on the real Umap RM corresponding to the detection frame in the range (Step S502).
Subsequently, the background detection unit 516 creates a histogram (hereinafter, referred to as a “background parallax histogram”) indicating a total value of parallax frequency of the parallax points distant from the object region on the real Umap RM corresponding to the detection frame by a predetermined distance or more in the range (Step S503).
Subsequently, the background detection unit 516 determines whether there is a portion having a value of the object parallax histogram equal to or smaller than a first predetermined value and a value of the background parallax histogram equal to or larger than a second predetermined value in the range (Step S504).
If the portion is present (Yes at Step S504), the background detection unit 516 determines that the background is present in the detection frame (Step S505), and the process is ended.
If the portion is not present (NO at Step S504), the background detection unit 516 determines that the background is not present in the detection frame (Step S506), and the process is ended.
Next, with reference to
In the following description, in the processing of detecting the background in the detection frame at Step S16 described above, each detection frame determined to include a background may be caused to be a processing target from among the detection frames corresponding to the object regions detected through the “integration detection processing”.
At Step S601, the rejection unit 517 determines whether there are a plurality of detection frames corresponding to a plurality of object regions detected through the basic detection processing or the separation detection processing in the detection frame as a processing target.
If a plurality of detection frames are not present (NO at Step S601), the process is ended.
If a plurality of detection frames are present (YES at Step S601), the rejection unit 517 determines whether the background is present in a portion between the detection frames (Step S602). At this point, when a value of the background parallax histogram is equal to or larger than the predetermined value in the portion similarly to the processing of detecting the background in the detection frame described above, it is determined that the background is present.
If the background is not present (NO at Step S602), the process is ended.
If the background is present (YES at Step S602), the rejection unit 517 rejects the detection frame as a processing target (Step S603).
In an example of
As illustrated in
The rejection unit 517 may reject the detection frame using another method without performing background detection. Among the detection frames corresponding to the object regions selected at Step S14, for example, the rejection unit 517 may reject a detection frame corresponding to the region of the object sorted into “others” using a method of sorting a classification of the object illustrated in
According to the embodiment described above, a first detection result having relatively low separation performance and a second detection result having relatively high separation performance are generated to be associated with each other. This configuration can improve performance of easily recognizing the object by performing simple processing at a later stage. One of the first detection result and the second detection result associated with each other is rejected based on a predetermined condition. This configuration can improve performance of recognizing each of a plurality of objects.
The value of distance (distance value) and the parallax value can be treated equivalently, so that the parallax image is used as an example of a distance image in the present embodiment. However, the embodiment is not limited thereto. For example, the distance image may be generated by integrating a parallax image generated by using a stereo camera with distance information generated by using a detection device such as a millimetric wave radar and a laser radar. Alternatively, a stereo camera and a detection device such as a millimetric wave radar and a laser radar may be used at the same time, and a result may be combined with a detection result of the object obtained by the stereo camera described above to further improve accuracy in detection.
It goes without saying that the system configuration in the embodiment described above is merely an example, and there are various examples of the system configuration in accordance with an application and a purpose. Some or all components in the embodiment described above may be combined.
For example, a functional unit that performs at least part of processing of the functional units such as the parallax value arithmetic processing unit 300, the second generation unit 500, the clustering processing unit 510, and the tracking unit 530 of the object recognition device 1 may be implemented by cloud computing constituted of one or more computers.
In the embodiment described above, described is an example in which the object recognition device 1 is mounted on the automobile as the vehicle 70. However, the embodiment is not limited thereto. For example, the object recognition device 1 may be mounted on a vehicle such as a motorcycle, a bicycle, a wheelchair, and a cultivator for farming as an example of other vehicles. The object recognition device 1 may be mounted on a mobile object such as a robot in addition to the vehicle as an example of a mobile object.
In the above embodiment, in a case in which at least one of the functional units of the parallax value deriving unit 3 and the recognition processing unit 5 in the object recognition device 1 is implemented by executing a computer program, the computer program is embedded and provided in a ROM and the like. The computer program executed by the object recognition device 1 according to the embodiment described above may be recorded and provided in a computer-readable recording medium such as a compact disc read only memory (CD-ROM), a flexible disk (FD), a compact disc recordable (CD-R), and a digital versatile disc (DVD), as an installable or executable file. The computer program executed by the object recognition device 1 according to the embodiment described above may be stored in a computer connected to a network such as the Internet and provided by being downloaded via the network. Furthermore, the computer program executed by the object recognition device 1 according to the embodiment described above may be provided or distributed via a network such as the Internet. The computer program executed by the object recognition device 1 according to the embodiment described above has a module configuration including at least one of the functional units described above. As actual hardware, when the CPU 52 (CPU 32) reads out and executes a computer program from the ROM 53 (ROM 33) described above, the functional units described above are loaded into a main storage device (RAM 54 (RAM 34) and the like) to be generated.
The imaging unit 1102 is arranged in the vicinity of a room mirror on a windshield 1106 of the vehicle 1101 as an example of a mobile object, and takes an image in a traveling direction of the vehicle 1101, for example. Various pieces of data including image data obtained through an imaging operation performed by the imaging unit 1102 are supplied to the analyzing unit 1103. The analyzing unit 1103 analyzes an object to be recognized such as a road surface on which the vehicle 1101 is traveling, a forward vehicle of the vehicle 1101, a pedestrian, and an obstacle based on the various pieces of data supplied from the imaging unit 1102. The control unit 1104 gives a warning and the like to a driver of the vehicle 1101 via the display unit 1105 based on an analysis result of the analyzing unit 1103. The control unit 1104 performs traveling support such as control of various onboard devices, and steering wheel control or brake control of the vehicle 1101 based on the analysis result. Although the following describes the vehicle as an example of equipment, the equipment control system according to the present embodiment can also be applied to a ship, an aircraft, a robot, and the like.
The analyzing unit 1103 includes a data bus line 10, a serial bus line 11, a CPU 15, an FPGA 16, a ROM 17, a RAM 18, a serial IF 19, and a data IF 20.
The imaging unit 1102 described above is connected to the analyzing unit 1103 via the data bus line 10 and the serial bus line 11. The CPU 15 executes and controls the entire operation, image processing, and image recognition processing of the analyzing unit 1103. Luminance image data of an image taken by the image sensors 6A and 6B of the first camera unit 1A and the second camera unit 1B is written into the RAM 18 of the analyzing unit 1103 via the data bus line 10. Change control data of sensor exposure value, change control data of an image reading parameter, various pieces of setting data, and the like from the CPU 15 or the FPGA 16 are transmitted or received via the serial bus line 11.
The FPGA 16 performs processing required to have real-time performance on the image data stored in the RAM 18. The FPGA 16 causes one of respective pieces of luminance image data (taken images) taken by the first camera unit 1A and the second camera unit 1B to be a reference image, and causes the other one thereof to be a comparative image. The FPGA 16 then calculates, as a parallax value (parallax image data) of a corresponding image portion, a position shift amount between a corresponding image portion on the reference image and a corresponding image portion on the comparative image, both of which corresponding to the same point in an imaging area.
Parallax value d=|Δ1−Δ2| (1)
The FPGA 16 of the analyzing unit 1103 performs processing required to have real-time performance such as gamma correction processing and distortion correction processing (paralleling of left and right taken images) on the luminance image data supplied from the imaging unit 1102. By performing the arithmetic operation of the expression 1 described above using the luminance image data on which the processing required to have real-time performance is performed, the FPGA 16 generates parallax image data to be written into the RAM 18.
The description will be continued returning to
Recognition data of the recognition target is supplied to the control unit 1104 via the serial IF 19. The control unit 1104 performs traveling support such as brake control of the host vehicle and speed control of the host vehicle using the recognition data of the recognition target.
Y=0.3R+0.59G+0.11B (2)
The preprocessing unit 1111 preprocesses the luminance image data received from the first camera unit 1A and the second camera unit 1B. In this example, gamma correction processing is performed as preprocessing. The preprocessing unit 1111 supplies the preprocessed luminance image data to a paralleled image generation unit 1112.
The paralleled image generation unit 1112 performs paralleling processing (distortion correction processing) on the luminance image data supplied from the preprocessing unit 1111. The paralleling processing is processing of converting the luminance image data output from the first camera unit 1A and the second camera unit 1B into an ideal paralleled stereo image obtained when two pinhole cameras are attached in parallel. Specifically, by using a calculation result obtained by calculating a distortion amount of each pixel using polynomial expressions such as Δx=f(x, y), Δy=g(x, y), each pixel of the luminance image data output from the first camera unit 1A and the second camera unit 1B is converted. The polynomial expression is, for example, based on a quintic polynomial expression related to x (a horizontal direction position of an image) and y (a vertical direction position of an image). Accordingly, paralleled luminance image can be obtained in which distortion of an optical system of the first camera unit 1A and the second camera unit 1B is corrected. In this example, the paralleled image generation unit 1112 is implemented by the FPGA 16.
The parallax image generation unit 1113 is an example of a “distance image generation unit”, and generates a parallax image including a parallax value for each pixel as an example of a distance image including distance information for each pixel from the stereo image taken by the imaging unit 1102. In this case, the parallax image generation unit 1113 performs the arithmetic operation expressed by the expression 1 described above assuming that the luminance image data of the first camera unit 1A is standard image data and the luminance image data of the second camera unit 1B is comparative image data, and generates parallax image data indicating a parallax between the standard image data and the comparative image data. Specifically, the parallax image generation unit 1113 defines a block including a plurality of pixels (for example, 16 pixels×1 pixel) centered on one focused pixel for a predetermined “row” of the standard image data. On the other hand, in the same “row” of the comparative image data, a block having the same size as that of the defined block of the standard image data is shifted one pixel by one pixel in a horizontal line direction (X-direction). The parallax image generation unit 1113 then calculates each correlation value indicating correlation between a feature amount indicating a feature of a pixel value of the defined block in the standard image data and a feature amount indicating a feature of a pixel value of each block in the comparative image data. In this case, the parallax image means information associating the vertical direction position, the horizontal direction position, and a depth direction position (parallax) with each other.
The parallax image generation unit 1113 performs matching processing for selecting the block of the comparative image data that is most closely correlated with the block of the standard image data among blocks in the comparative image data based on the calculated correlation value. Thereafter, a position shift amount is calculated as the parallax value d, the position shift amount between the focused pixel in the block of the standard image data and a corresponding pixel in the block of the comparative image data selected through the matching processing. When such processing of calculating the parallax value d is performed on the entire region or a specific region of the standard image data, the parallax image data is obtained. As a method of generating the parallax image, various well-known techniques can be utilized. In short, it can be considered that the parallax image generation unit 1113 calculates (generates) the distance image (in this example, the parallax image) including the distance information for each pixel from the stereo image taken by the stereo camera.
As the feature amount of the block used in the matching processing, for example, a value (luminance value) of each pixel in the block can be used. As the correlation value, the sum total of absolute values of differences between a value (luminance value) of each pixel in the block of the standard image data and a value (luminance value) of each pixel in the block of the comparative image data corresponding to the former pixel can be used. In this case, the block including the smallest sum total is detected as the most correlated block.
As the matching processing of the parallax image generation unit 1113, for example, used is a method such as Sum of Squared Difference (SSD), Zero-mean Sum of Squared Difference (ZSSD), Sum of Absolute Difference (SAD), or Zero-mean Sum of Absolute Difference (ZSAD). When a parallax value of a sub-pixel level that is smaller than one pixel is required in the matching processing, an estimation value is used. Examples of a method of estimating the estimation value include an equiangular linear method or a quadratic curve method. However, an error is caused in the estimated parallax value of sub-pixel level. Thus, a method such as estimation error correction (EEC) and the like may be used for reducing an estimation error.
In this example, the parallax image generation unit 1113 is implemented by the FPGA 16. The parallax image generated by the parallax image generation unit 1113 is supplied to the object detection processing unit 1114. In this example, the function of the object detection processing unit 1114 is implemented when the CPU 15 executes a three-dimensional object recognition program.
As illustrated in
Based on a plurality of pixels corresponding to a second range indicating a range of height equal to or larger than a predetermined value within a first range higher than the road surface (an example of a reference object as a reference of height of an object) in the parallax image, the first generation unit 1132 generates first information in which the position in the horizontal direction indicating a direction orthogonal to the optical axis of the stereo camera is associated with the position in the depth direction indicating the direction of the optical axis of the stereo camera. In this example, the first information is a two-dimensional histogram in which the horizontal axis (X-axis) indicates a distance (actual distance) in the horizontal direction, the vertical axis (Y-axis) indicates the parallax value d of the parallax image, and the axis in the depth direction indicates frequency. It can be considered that the first information is information in which the frequency value of the parallax is recorded for each combination of the actual distance and the parallax value d. In the following description, the first information is referred to as a “High Umap”. Assuming that the position in the horizontal direction of the parallax image is x, the position in the vertical direction is y, and the parallax value set for each pixel is d, the first generation unit 1132 generates a two-dimensional histogram in which the horizontal axis indicates x of the parallax image, the vertical axis indicates the parallax value d, and the axis in the depth direction indicates the frequency by voting a point (x, y, d) in the parallax image corresponding to the second range based on a value of (x, d). The horizontal axis of the two-dimensional histogram is converted into the actual distance to generate the High Umap. It can be considered that the vertical axis of the High Umap indicates the position in the depth direction (a smaller parallax value d represents a larger distance in the depth direction).
A linear expression representing the road surface is obtained through road surface estimation by the road surface estimation unit 1131 described above, so that when the parallax value d is determined, a corresponding y-coordinate y0 is determined, and the coordinate y0 represents the height of the road surface. For example, when the parallax value is d and the y-coordinate is y′, y′−y0 represents the height from the road surface in a case in which the parallax value is d. The height H from the road surface at coordinates (d, y′) can be obtained through an arithmetic expression H=(z×(y′−y0))/f. In this arithmetic expression, “z” is a distance calculated based on the parallax value d (z=BF/(d−offset)), and “f” is a value obtained by converting a focal length of the imaging unit 1102 into the same unit as a unit of (y′−y0). In this case, BF is a value obtained by multiplying a base length B by a focal length f of the imaging unit 1102, and offset is a parallax in a case of photographing an infinite object.
For example, in the taken image illustrated in
The following continues the description with reference to
The following continues the description with reference to
In the following description, the Standard Umap, the High Umap, and the Small Umap are each referred to as a “real Umap” when they are not required to be distinguished from each other. The real Umap may be regarded as an overhead map (an overhead image, a bird's-eye view image) in which the horizontal axis is the vertical direction (the right and left direction of the camera) with respect to the optical axis of the stereo camera, and the vertical axis is an optical axis direction of the stereo camera.
The following continues the description returning to
The isolated region detection processing unit 1140 performs isolated region detection processing for detecting an isolated region (assembly region) as a region of a cluster of parallax values d from each real Umap (the High Umap, the Standard Umap, and the Small Umap) received from the road surface detection processing unit 1122. Specific content of the isolated region detection processing unit 1140 will be described later.
For example, in a case of the taken image illustrated in
The parallax image processing unit 1150 performs parallax image processing for detecting object information in a real space or a region on the parallax image corresponding to the isolated region on the real Umap detected by the isolated region detection processing unit 1140.
The rejection processing unit 1160 performs rejection processing for selecting an object to be output based on the object information in the real space or the region on the parallax image detected by the parallax image processing unit 1150. The rejection processing unit 1160 performs size rejection focusing on a size of the object, and overlap rejection focusing on a positional relation between objects. For example, in the size rejection, rejected is a detection result of a size not falling within a size range determined for each object type illustrated in
The output information (detection result) from the clustering processing unit 1123 is input to the tracking processing unit 1124 illustrated in
Next, the following describes specific content of the isolated region detection processing unit 1140 illustrated in
The first detection unit 1141 detects an assembly region of the parallax value d (an example of distance information) from the High Umap (first information). In the following description, detection processing performed by the first detection unit 1141 is referred to as “separation detection processing”, and a processing result thereof is referred to as a “separation detection result (including the detected assembly region)”. The High Umap is hardly influenced by an object present in a region at a low height as compared with the Standard Umap, so that separation performance of the High Umap is excellent. However, erroneous separation detection tends to be caused for an object having no parallax in a region having a high height from the road surface. Specific processing content will be described later.
The second detection unit 1142 detects an assembly region from the Standard Umap (second information). In the following description, detection processing performed by the second detection unit 1142 is referred to as “basic detection processing”, and a processing result thereof is referred to as a “basic detection result (including the detected assembly region)”. The separation detection result described above is assumed to accompany the basic detection result (to be included in the basic detection result). With the Standard Umap, stable detection can be expected for the entire detection range because distance resolution for one pixel is high and the detection range includes a low position to a high position of the road surface. However, when an estimated road surface is detected to be lower than an actual road surface through road surface estimation or the parallax of the detection target is low, erroneous detection is easily caused due to a characteristic of the Standard Umap. Specific processing content will be described later.
The third detection unit 1143 detects an assembly region from the Small Umap (third information). In the following description, detection processing performed by the third detection unit 1143 is referred to as “detection processing for integration”, and a processing result thereof is referred to as an “integration detection result (including the detected assembly region)”. The Small Umap has a characteristic such that erroneous separation is hardly caused for an object that hardly has a parallax because resolution for one pixel is lower than that of the Standard Umap. However, because separation performance (resolution) is low, objects tend to be detected being coupled to each other in the detection processing (detection processing for integration) using the Small Umap.
The final determination processing unit 1144 performs final determination processing of causing the “basic detection result”, the “separation detection result”, and the “integration detection result” to be inputs, selecting and correcting the detection result to be output, and clarifying a relation between the detection results. As illustrated in
For convenience of explanation, first, the following describes specific content of the basic detection processing.
The description will be continued returning to
Through the basic detection processing described above, information indicating the detection rectangle on the Standard Umap is output as output information. An ID for identifying a group is assigned to grouped pixels (pixels included in the detected assembly region) in the detection rectangle on the Standard Umap. That is, information indicating a map of the ID grouped on the Standard Umap (an “ID Umap on the Standard Umap”, or simply referred to as an “ID Umap” when it is not required to be distinguished from others in some cases) is output as output information.
Next, the following describes specific content of the separation detection processing.
Next, the first detection unit 1141 performs size check processing for each detection rectangle created at Step S1032 (Step S1033). Specific content of size check processing is described above. Next, the first detection unit 1141 performs frequency check processing (Step S1034). Specific content of the frequency check processing is described above. When the processing described above is not completed for all basic detection results (when loop processing corresponding to the number of basic detection results is not finished), processing subsequent to Step S1031 is repeated. That is, the first detection unit 1141 repeats the processing described above corresponding to the number of basic detection results.
Through the separation detection processing described above, information indicating the detection rectangle on the High Umap (a detection result on the High Umap associated with the basic detection result) is output as output information. An ID for identifying a group is assigned to each grouped pixel in the detection rectangle on the High Umap. That is, information indicating a map of the ID grouped on the High Umap (an “ID Umap on the High Umap”, or simply referred to as an “ID Umap” when it is not required to be distinguished from others in some cases) is output as output information.
Next, the following describes specific content of detection processing for integration. Basic content of the detection processing for integration is similar to that of the basic detection processing.
After Step S1041, the third detection unit 1143 performs detection rectangle creating processing (Step S1042). Specific content thereof is described above. Next, the third detection unit 1143 performs output determination processing (Step S1043). The output determination processing is processing for selecting a detection result to be output by determining whether the size, the frequency value of the parallax, a depth length, and the like of the detection rectangle (detection result) created at Step S1042 meet a condition thereof. In the detection processing for integration, objects tend to be detected being coupled to each other, so that it is assumed herein that only a detection result having a characteristic which seems to be a vehicle is output.
Through the detection processing for integration described above, the information indicating the detection rectangle on the Small Umap is output as output information. An ID for identifying a group is assigned to each grouped pixel in the detection rectangle on the Small Umap. That is, information indicating a map of the ID grouped on the Small Umap (an “ID Umap on the Small Umap”, or simply referred to as an “ID Umap” when it is not required to be distinguished from others in some cases) is output as output information.
Next, the following describes final determination processing performed by the final determination processing unit 1144. The final determination processing unit 1144 receives three results including the basic detection result, the separation detection result, and the integration detection result, calculates a correspondence relation among the detection results, and sets an inclusive frame and a partial frame accompanying the inclusive frame. The final determination processing unit 1144 corrects the inclusive frame and the partial frame, and selects an output target therefrom. The inclusive frame stores a result detected through processing having low separation performance. That is, the inclusive frame indicates a frame having a larger size for the same object. In this case, the integration detection result or the basic detection result is set as the inclusive frame. The partial frame stores a result detected through processing having separation performance higher than that of the inclusive frame. The partial frame is a detection frame (an outer frame of the detection result) associated with the inclusive frame, and is a result obtained by separating the inside of the inclusive frame. In this case, the basic detection result or the separation detection result corresponds to the partial frame. Herein, the frame indicates a position and a size of the object, and is information associating coordinates of a corner of the rectangle surrounding the object with the height and the width, for example.
The processing from Step S1051 to Step S1056 illustrated in
When the integration detection result is determined to be valid through the rejection determination processing described above, the result of Step S1052 in
At Step S1053, the merge processing unit 1146 performs matching between the integration detection result and the basic detection result. Specific content thereof is described below. The merge processing unit 1146 detects overlapping between the detection frame of the integration detection result and the detection frame of the basic detection result on the Large Umap, clarifies a correspondence relation based on the detection result, and selects the integration detection result to be a processing target.
In this example, first, the merge processing unit 1146 calculates an overlapping rate of the integration detection result and the basic detection result. When the size of the basic detection result is smaller than the size of the integration detection result, the overlapping rate is calculated by dividing an area of an overlapping region of the basic detection result and the integration detection result by an area of the basic detection result. When the size of the basic detection result is larger than the size of the integration detection result (when the size of the integration detection result is smaller than the size of the basic detection result), the overlapping rate is calculated by dividing an area of an overlapping region of the basic detection result and the integration detection result by an area of the integration detection result. In this example, when the overlapping rate is larger than a threshold (for example, 0.5), the merge processing unit 1146 determines that the basic detection result overlapping with the integration detection result is present. The merge processing unit 1146 then sets the inclusive frame and the partial frame based on the condition illustrated in
In the example of
The description will be continued returning to
On the other hand, if the result of Step S1054 is “No”, the merge processing unit 1146 sets only the integration detection result as the inclusive frame (Step S1056). No partial frame is set because a corresponding basic detection result is not present. That is, the integration detection result is set as the inclusive frame, and one “detection result” in which no partial frame is set is generated.
The correction processing at Step S1057 is performed corresponding to the number of “detection results” generated as described above. The following describes the correction processing performed by the correction unit 1147. The correction unit 1147 performs integration correction processing when the detection result includes the integration detection result. Specific content of the integration correction processing will be described later. On the other hand, when the detection result does not include the integration detection result, the correction unit 1147 corrects a first assembly region using a correction method corresponding to a distance of an assembly region (the first assembly region indicating an assembly region (a set of pixels to which an ID is given) detected by the first detection unit 1141) included in the separation detection result set as the partial frame. The distance of the first assembly region indicates a distance (distance in the depth direction) from the stereo camera, and can be obtained using the parallax value d of each pixel included in the first assembly region. When the distance of the first assembly region is smaller than a threshold, the correction unit 1147 performs first correction processing on the first assembly region. When the distance of the first assembly region is equal to or larger than the threshold, the correction unit 1147 performs second correction processing on the first assembly region. In a case of short distance, erroneous separation of the separation detection result is hardly caused due to high road surface estimation accuracy. On the other hand, in a case of long distance, erroneous separation of the separation detection result is easily caused due to low road surface estimation accuracy. Thus, as the threshold, it is preferable to set a value of distance that can secure the road surface estimation accuracy. In this example, the threshold is set to be 30 m, but the embodiment is not limited thereto.
The first correction processing is processing of expanding the first assembly region using a relative standard of the height of the first assembly region from the reference object (road surface). More specifically, the first correction processing is processing of expanding the first assembly region to a boundary, a boundary being a position at which the height of the region of interest from the reference object is lower than a relative height threshold that indicates a relative value in accordance with an average value of the height of the first assembly region (the height from the reference object) in a second assembly region (an assembly region included in the basic detection result associated with the separation detection result) including the first assembly region and indicating the assembly region detected by the second detection unit 1142 in a direction in which the region of interest indicating a region directed outward from the first assembly region continues. Specific content thereof will be described later. In the following description, the first correction processing is referred to as “correction processing for short distance”.
The second correction processing is processing of coupling two first assembly regions by using a relative standard of the height of the first assembly region from the reference object (road surface). More specifically, the second correction processing is processing of coupling one first assembly region and the other first assembly region when the height of the region of interest from the reference object is equal to or larger than the relative height threshold that indicates a relative value in accordance with an average value of the height (the height from the reference object) of the first assembly region in a direction continuous from one first assembly region to the other first assembly region in the region of interest indicating a region between two first assembly regions in the second assembly region including two or more first assembly regions. Specific content will be described later. In the following description, the second correction processing is referred to as “correction processing for long distance”.
The correction unit 1147 repeats the processing from Step S1061 to Step S1067 corresponding to the number of “detection results”. First, the correction unit 1147 creates an ID table (Step S1061). The ID table is information having a table format in which the inclusive frame and the partial frame are associated with each other using an ID. Next, the correction unit 1147 counts the number of partial frames having a size corresponding to a vehicle size among partial frames included in a focused detection result (a group of the inclusive frame and the partial frame) (Step S1062). Next, the correction unit 1147 determines whether the detection result includes the integration detection result (Step S1063). That is, the correction unit 1147 determines whether the inclusive frame included in the detection result is the integration detection result.
If the result of Step S1063 is “Yes” (Yes at Step S1063), the correction unit 1147 performs integration correction processing (Step S1064). If the result of Step S1063 is “No” (No at Step S1063), the correction unit 1147 determines whether a distance of the detection result is smaller than a predetermined distance (for example, 30 m) (Step S1065). If the result of Step S1065 is “Yes” (Yes at Step S1065), the correction unit 1147 performs correction processing for short distance (Step S1066). If the result of Step S1065 is “No” (No at Step S1065), the correction unit 1147 performs correction processing for long distance (Step S1067).
In the present embodiment, when the detection result includes the integration detection result (a result of detection using the Small Umap as a map having low resolution), integration correction processing is performed considering a distance difference and a horizontal position on the basic detection result and the separation detection result. Accordingly, the detection result can be corrected to have high separation performance while reducing erroneous separation. In the present embodiment, appropriate one of the correction processing for short distance and the correction processing for long distance is used depending on the distance of the detection result. Accordingly, correction can be performed using an appropriate method for short distance having high road surface estimation accuracy and long distance having low road surface estimation accuracy.
Next, the following describes specific content of the integration correction processing. The integration detection result is obtained by using a map having coarse resolution (Small Umap). Due to this, erroneous separation of the object can be reduced, but separation performance is deteriorated. On the other hand, the basic detection result and the separation detection result are obtained by using a map having high resolution, so that separation performance is high but erroneous separation of the object is problematic. In correction processing for integration detection, all partial frames (the basic detection result or the separation detection result) associated with the integration detection result are not coupled (integrated) with each other as the same object, but the partial frame is corrected to be a detection result having high separation performance while reducing erroneous separation by making coupling determination based on a distance difference or a horizontal direction.
First, the following describes correction processing of the inclusive frame. As illustrated in
Next, the following describes correction processing of the partial frame.
Next, the following describes coupling processing of the partial frames at Step S1073 in
Under the condition of
Next, the following describes specific content of the correction processing for short distance at Step S1066 in
Next, the following describes specific content of the correction processing for long distance at Step S1067 in
The following describes specific content of the coupling determination processing. As illustrated in
As illustrated in
For example, considering occurrence of distortion in the object in a case of long distance, the region of interest may be divided into an upper part and a lower part, and whether to perform coupling processing may be determined by checking continuity of height for each divided region of interest.
The following continuously describes the procedure of
At Step S1106, the correction unit 1147 performs correction processing on the partial frame. Content of the correction processing is the same as that of the correction processing at Step S1072 in
As described above, in the present embodiment, the correction unit 1147 corrects the first assembly region while switching the correction method in accordance with the distance of the first assembly region obtained through the separation detection processing. More specifically, the correction unit 1147 performs correction processing for short distance on the first assembly region when the distance of the first assembly region is smaller than a threshold, and performs correction processing for long distance on the first assembly region when the distance of the first assembly region is equal to or larger than the threshold. As described above, estimation accuracy for the road surface is high in a case of short distance, so that erroneous separation of the separation detection result is hardly caused, but an object spreading in a region having a low height from the road surface may be detected to have a smaller frame than an actual frame in the separation detection processing. Considering the above points, the correction processing for short distance is processing of expanding the first assembly region by using a relative standard of the height of the first assembly region from the road surface obtained through the separation detection processing. In the case of short distance, estimation accuracy for the road surface is high, so that processing such as coupling is not required. As described above, estimation accuracy for the road surface in a case of long distance is lower than that in the case of short distance, so that erroneous separation of the separation detection result is easily caused. Considering the above points, the correction processing for long distance is processing of coupling two first assembly regions by using a relative standard of the height of the first assembly region from the road surface obtained through the separation detection processing. As described above, detection accuracy for the object can be sufficiently secured by switching between the correction processing for short distance and the correction processing for long distance in accordance with the distance of the first assembly region obtained through the separation detection processing to correct the first assembly region.
The embodiments according to the present invention have been described above, but the present invention is not limited to the embodiments. In an implementation phase, components can be modified to be embodied without departing from the gist of the invention. Various inventions can be made by appropriately combining a plurality of components disclosed in the embodiments described above. For example, some components may be deleted from all the components disclosed in the embodiments.
The computer program executed by the equipment control system 1100 according to the embodiments described above may be recorded and provided in a computer-readable recording medium such as a compact disc read only memory (CD-ROM), a flexible disk (FD), a compact disc recordable (CD-R), a digital versatile disc (DVD), and a Universal Serial Bus (USB) as an installable or executable file, or may be provided or distributed via a network such as the Internet. Various computer programs may be embedded and provided in a ROM, for example.
The above-described embodiments are illustrative and do not limit the present invention. Thus, numerous additional modifications and variations are possible in light of the above teachings. For example, at least one element of different illustrative and exemplary embodiments herein may be combined with each other or substituted for each other within the scope of this disclosure and appended claims. Further, features of components of the embodiments, such as the number, the position, and the shape are not limited the embodiments and thus may be preferably set. It is therefore to be understood that within the scope of the appended claims, the disclosure of the present invention may be practiced otherwise than as specifically described herein. Further, any of the above-described apparatus, devices or units can be implemented as a hardware apparatus, such as a special-purpose circuit or device, or as a hardware/software combination, such as a processor executing a software program. Further, as described above, any one of the above-described and other methods of the present invention may be embodied in the form of a computer program stored in any kind of storage medium. Examples of storage mediums include, but are not limited to, flexible disk, hard disk, optical discs, magneto-optical discs, magnetic tapes, nonvolatile memory, semiconductor memory, read-only-memory (ROM), etc. Alternatively, any one of the above-described and other methods of the present invention may be implemented by an application specific integrated circuit (ASIC), a digital signal processor (DSP) or a field programmable gate array (FPGA), prepared by interconnecting an appropriate network of conventional component circuits or by a combination thereof with one or more conventional general purpose microprocessors or signal processors programmed accordingly.
1 Object recognition device (example of “information processing device”)
2 Main body unit (example of “imaging device”)
3 Parallax value deriving unit
4 Communication line
5 Recognition processing unit
6 Vehicle control device (example of “control device”)
60 Equipment control system
70 Vehicle
100
a, 100b Image acquisition unit
200
a, 200b Conversion unit
300 Parallax value arithmetic processing unit (example of “generation unit”)
500 Second generation unit
501 Third generation unit (example of “movement surface estimation unit”)
502 Fourth generation unit
503 Fifth generation unit
510 Clustering processing unit
511 Basic detection unit (example of “first detection unit”)
512 Separation detection unit (example of “second detection unit”)
513 Integration detection unit (example of “first detection unit”)
514 Selection unit
515 Frame creation unit
516 Background detection unit
517 Rejection unit
530 Tracking unit
1100 Equipment control system
1101 Vehicle
1102 Imaging unit
1103 Analyzing unit
1104 Control unit
1105 Display unit
1106 Windshield
1111 Preprocessing unit
1112 Paralleled image generation unit
1113 Parallax image generation unit
1114 Object detection processing unit
1121 Acquisition unit
1122 Road surface detection processing unit
1123 Clustering processing unit
1124 Tracking processing unit
1131 Road surface estimation unit
1132 First generation unit
1133 Second generation unit
1134 Third generation unit
1140 Isolated region detection processing unit
1141 First detection unit
1142 Second detection unit
1143 Third detection unit
1144 Final determination processing unit
1145 Rejection determination processing unit
1146 Merge processing unit
1147 Correction unit
1150 Parallax image processing unit
1160 Rejection processing unit
PTL 1: Japanese Laid-open Patent Publication No. 2008-065634
Number | Date | Country | Kind |
---|---|---|---|
2016-229468 | Nov 2016 | JP | national |
2016-229566 | Nov 2016 | JP | national |
2016-229572 | Nov 2016 | JP | national |
2017-177897 | Sep 2017 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2017/042302 | 11/24/2017 | WO | 00 |