The present disclosure relates to an information processing device, an information processing method, and an information processing program.
In recent years, with an increase in resolution of imaging devices such as digital still cameras, digital video cameras, and small cameras mounted on multifunctional mobile phones (smartphones) and the like, information processing devices having an image recognition function for recognizing a predetermined target object included in a captured image have been developed.
Patent Document 1: Japanese Patent Application Laid-Open No. 2017-112409
With the image recognition function, it is possible to improve performance in detecting a target object by using a captured image with a higher resolution. However, in the conventional technology, image recognition using a captured image with a high resolution requires a large calculation workload related to image recognition processing, and it is difficult to improve the simultaneity of recognition processing for a captured image.
An object of the present disclosure is to provide an information processing device, an information processing method, and an information processing program capable of improving characteristics of recognition processing using a captured image.
An information processing device according to the present disclosure includes: a setting section that sets a pixel position for acquiring a sampling pixel for each divided region obtained by dividing imaging information including pixels; a calculation section that calculates a feature amount of a sampling image including the sampling pixel; and a recognition section that performs recognition processing on the basis of the feature amount of the sampling image and outputs a recognition processing result, in which the setting section sets different pixel positions for first imaging information and second imaging information acquired after the first imaging information in time series among pieces of the imaging information.
Hereinafter, embodiments of the present disclosure will be described in detail with reference to the drawings. Note that, in the following embodiments, the same parts are denoted by the same reference signs, and a redundant description will be omitted.
Hereinafter, embodiments of the present disclosure will be described in the following order.
First, a technology applicable to each embodiment will be schematically described in order to facilitate understanding.
The sensor section 10a performs imaging under the control of the imaging control section, and supplies image data of a captured image acquired by imaging to the recognition processing section 20a. The recognition processing section 20a performs recognition processing on image data by using a deep neural network (DNN). More specifically, the recognition processing section 20a includes a recognition model trained in advance by machine learning using predetermined training data, and performs recognition processing using the DNN on the image data supplied from the sensor section 10a on the basis of the recognition model. The recognition processing section 20a outputs a recognition result obtained by the recognition processing to the outside of the information processing device 1a, for example.
The processing of
In the DNN, recognition processing can be performed using time-series information.
The identification processing illustrated in
The CPU 1205 is operated using the RAM 1207 as a work memory according to a program stored in advance in the ROM 1206, and controls the overall operation of the information processing device 1. The interface 1204 communicates with the outside of the information processing device 1 by wired or wireless communication. For example, in a case where the information processing device 1 is used for in-vehicle use, the information processing device 1 can communicate with a braking control system or the like of a vehicle on which the information processing device 1 is mounted via the interface 1204.
The imaging section 1200 captures a moving image in a predetermined frame cycle and outputs pixel data for forming a frame image. More specifically, the imaging section 1200 includes a plurality of photoelectric conversion elements that converts light received by each photoelectric conversion element into a pixel signal that is an electric signal by photoelectric conversion, and a drive circuit that drives each photoelectric conversion element. In the imaging section 1200, the plurality of photoelectric conversion elements is arranged in a matrix array to constitute a pixel array.
For example, the sensor section 10a in
Here, each photoelectric conversion element corresponds to a pixel in the image data, and in a pixel array section, photoelectric conversion elements whose number corresponds to, for example, 1920 pixels × 1080 pixels (rows × columns) are arranged in a matrix array. Note that an image of one frame is formed by pixel signals from the photoelectric conversion elements whose number corresponds to 1920 pixels × 1080 pixels.
The optical section 1201 includes a lens, an autofocus mechanism, and the like, and causes the pixel array section included in the imaging section 1200 to emit light incident on the lens. The imaging section 1200 generates a pixel signal for each photoelectric conversion element according to light emitted to the pixel array section via the optical section 1201. The imaging section 1200 converts a pixel signal that is an analog signal into pixel data that is a digital signal, and outputs the pixel data. The pixel data output from the imaging section 1200 is stored in the memory 1202. The memory 1202 is, for example, a frame memory, and can store pixel data for at least one frame.
The DSP 1203 performs predetermined image processing on the pixel data stored in the memory 1202. Furthermore, the DSP 1203 includes a recognition model trained in advance, and performs the recognition processing using the DNN described above on the image data stored in the memory 1202 on the basis of the recognition model. The recognition result that is a result of the recognition processing performed by the DSP 1203 is temporarily stored in, for example, a memory included in the DSP 1203 or the RAM 1207, and is output from the interface 1204 to the outside. Alternatively, in a case where the information processing device 1 includes a storage device, the recognition result may be stored in the storage device.
Alternatively, and the function of the DSP 1203 may be implemented by the CPU 1205. In addition, a graphics processing unit (GPU) may be used instead of the DSP 1203.
A complementary metal oxide semiconductor (CMOS) image sensor (CIS) in which each section included in the imaging section 1200 is integrally formed using a CMOS can be applied as the imaging section 1200. The imaging section 1200 can be formed on one substrate. Alternatively, the imaging section 1200 may be a multilayer CIS in which a plurality of semiconductor chips is stacked and integrally formed. Note that the imaging section 1200 is not limited to this example, and may be another type of optical sensor such as an infrared light sensor that performs imaging with infrared light.
As an example, the imaging section 1200 can be formed by a multilayer CIS having a two-layer structure in which semiconductor chips are stacked in two layers.
As illustrated on the right side of
As another example, the imaging section 1200 can be formed to have a three-layer structure in which three semiconductor chips are stacked.
As illustrated on the right side of
Note that, in the configurations of
The pixel array section 1001 includes a plurality of pixel circuits 1000 including photoelectric conversion elements that are implemented by, for example, photodiodes, and perform photoelectric conversion on respective received light, and a circuit that reads electric charges from the photoelectric conversion elements. In the pixel array section 1001, the plurality of pixel circuits 1000 is arranged in a matrix array in a horizontal direction (row direction) and a vertical direction (column direction). In the pixel array section 1001, the arrangement of the pixel circuits 1000 in the row direction is referred to as a line. For example, in a case where an image of one frame is formed with 1920 pixels × 1080 lines, the pixel array section 1001 includes at least 1080 lines each including at least 1920 pixel circuits 1000. An image (image data) of one frame is formed by pixel signals read from the pixel circuits 1000 included in the frame.
Furthermore, in the pixel array section 1001, the pixel signal line 1006 is connected to each row and column of the pixel circuits 1000, and the vertical signal line VSL is connected to each column. An end of the pixel signal line 1006 that is not connected to the pixel array section 1001 is connected to the vertical scanning section 1002. The vertical scanning section 1002 transmits a control signal such as a drive pulse at the time of reading a pixel signal from a pixel to the pixel array section 1001 via the pixel signal line 1006 under the control of the control section 1100 described later. An end of the vertical signal line VSL that is not connected to the pixel array section 1001 is connected to the AD conversion section 1003. A pixel signal read from a pixel is transmitted to the AD conversion section 1003 via the vertical signal line VSL.
Control of reading of a pixel signal from the pixel circuit 1000 will be schematically described. The reading of a pixel signal from the pixel circuit 1000 is performed by transferring electric charges accumulated in the photoelectric conversion element by exposure to a floating diffusion (FD) layer, and converting the transferred electric charges into a voltage in the floating diffusion layer. The voltage obtained by converting the electric charges in the floating diffusion layer is output as a pixel signal to the vertical signal line VSL via an amplifier.
More specifically, in the pixel circuit 1000, during exposure, the photoelectric conversion element and the floating diffusion layer are cut off from each other (open state), and electric charges generated in response to incident light are accumulated by photoelectric conversion in the photoelectric conversion element. After the exposure is completed, the floating diffusion layer and the vertical signal line VSL are connected according to a selection signal supplied via the pixel signal line 1006. Further, the floating diffusion layer is connected to a supply line for a power supply voltage VDD or a black level voltage in a short period of time according to a reset pulse supplied via the pixel signal line 1006, and the floating diffusion layer is reset. A voltage of a reset level of the floating diffusion layer (referred to as a voltage A) is output to the vertical signal line VSL. Thereafter, the photoelectric conversion element and the floating diffusion layer are connected (closed state) by a transfer pulse supplied via the pixel signal line 1006, and the electric charges accumulated in the photoelectric conversion element are transferred to the floating diffusion layer. A voltage corresponding to an electric charge amount of the floating diffusion layer (referred to as a voltage B) is output to the vertical signal line VSL.
The AD conversion section 1003 includes an AD converter 1007 provided for each vertical signal line VSL, a reference signal generation section 1004, and a horizontal scanning section 1005. The AD converter 1007 is a column AD converter that performs AD conversion processing on each column of the pixel array section 1001. The AD converter 1007 performs AD conversion processing on a pixel signal supplied from the pixel circuit 1000 via the vertical signal line VSL, and generates two digital values (values respectively corresponding to the voltage A and the voltage B) for correlated double sampling (CDS) processing for noise reduction.
The AD converter 1007 supplies the generated two digital values to the signal processing section 1101. The signal processing section 1101 performs CDS processing on the basis of the two digital values supplied from the AD converter 1007, and generates pixel data that is a pixel signal by a digital signal.
The reference signal generation section 1004 generates, as a reference signal, a ramp signal used by each AD converter 1007 to convert a pixel signal into two digital values on the basis of a control signal input from the control section 1100. The ramp signal is a signal whose level (voltage value) decreases at a constant slope with respect to time, or a signal whose level decreases stepwise. The reference signal generation section 1004 supplies the generated ramp signal to each AD converter 1007. The reference signal generation section 1004 is implemented using, for example, a digital-to-analog converter (DAC) or the like.
In a case where a ramp signal whose voltage decreases stepwise at a predetermined slope is supplied from the reference signal generation section 1004, a counter starts counting according to a clock signal. A comparator compares a voltage of the pixel signal supplied from the vertical signal line VSL with a voltage of the ramp signal, and the counter stops counting at a timing at which the voltage of the ramp signal exceeds the voltage of the pixel signal. The AD converter 1007 converts a pixel signal by an analog signal into digital values by outputting a value corresponding to a count value at a time point when the counting is stopped.
The AD converter 1007 supplies the generated two digital values to the signal processing section 1101. The signal processing section 1101 performs CDS processing on the basis of the two digital values supplied from the AD converter 1007, and generates a pixel signal (pixel data) by a digital signal. The pixel data generated by the signal processing section 1101 is stored in a frame memory (not illustrated), and in a case where pixel data for one frame is stored in the frame memory, the pixel data is output from the imaging section 1200 as image data of one frame.
Under the control of the control section 1100, the horizontal scanning section 1005 performs selective scanning to select the respective AD converters 1007 in a predetermined order, thereby sequentially outputting each digital value temporarily held by each AD converter 1007 to the signal processing section 1101. The horizontal scanning section 1005 includes, for example, a shift register, an address decoder, and the like.
The control section 1100 performs drive control for the vertical scanning section 1002, the AD conversion section 1003, the reference signal generation section 1004, the horizontal scanning section 1005, and the like in accordance with an imaging control signal supplied from a sensor control section 11. The control section 1100 generates various drive signals serving as references for operations of the vertical scanning section 1002, the AD conversion section 1003, the reference signal generation section 1004, and the horizontal scanning section 1005. The control section 1100 generates a control signal for causing the vertical scanning section 1002 to supply a signal to each pixel circuit 1000 via the pixel signal line 1006 on the basis of, for example, a vertical synchronization signal included in the imaging control signal or an external trigger signal, and a horizontal synchronization signal. The control section 1100 supplies the generated control signal to the vertical scanning section 1002.
Furthermore, the control section 1100 passes, for example, information indicating an analog gain included in an imaging control signal supplied from the CPU 1205 to the AD conversion section 1003. The AD conversion section 1003 controls a gain of a pixel signal input to each AD converter 1007 included in the AD conversion section 1003 via the vertical signal line VSL according to the information indicating the analog gain.
The vertical scanning section 1002 supplies various signals including a drive pulse to the pixel signal line 1006 of the selected pixel row of the pixel array section 1001 to each pixel circuit 1000 for each line on the basis of the control signal supplied from the control section 1100, and causes each pixel circuit 1000 to output a pixel signal to the vertical signal line VSL. The vertical scanning section 1002 includes, for example, a shift register, an address decoder, and the like. Furthermore, the vertical scanning section 1002 controls the exposure in each pixel circuit 1000 according to information indicating exposure supplied from the control section 1100.
The imaging section 1200 configured as described above is a column AD type CMOS image sensor in which the AD converters 1007 are arranged for each column.
Next, resolution of an image used for the recognition processing will be described with reference to
In the example of low resolution of
Meanwhile, the recognition processing for a high-resolution image requires a larger calculation workload as compared with the recognition processing for a low-resolution image, and thus, the processing takes time. Therefore, it is difficult to enhance simultaneity between a recognition result and a captured image. On the other hand, since the recognition processing for a low-resolution image requires a small calculation workload, the processing can be performed in a short time, and the simultaneity with the captured image can be relatively easily enhanced.
As an example, a case where the recognition processing is performed on the basis of a captured image captured by an in-vehicle imaging device will be considered. In this case, since it is necessary to recognize a distant target object (for example, an oncoming vehicle traveling on an opposite lane in a direction opposite to a traveling direction of a host vehicle) with high simultaneity, it is conceivable to perform the recognition processing for a low-resolution image. However, as described with reference to
In each embodiment of the present disclosure, in order to enable easy and high-speed recognition of a distant target object, recognition processing is performed on a sampling image including pixels obtained by thinning a high-resolution captured image by subsampling according to a predetermined rule. Sampling of pixels different from those of subsampling for the immediately previous captured image is performed for a captured image acquired in the next frame, and the recognition processing is performed on a sampling image including the sampled pixels.
In a second captured image acquired after a first captured image in time series, an operation of performing the recognition processing on a sampling image obtained by sampling pixels different from those of the first captured image is repeatedly performed in units of frames. This makes it possible to rapidly acquire a recognition result while using a high-resolution captured image. Furthermore, it is possible to acquire a more accurate recognition result by sequentially integrating a feature amount extracted at the time of performing the recognition processing with a feature amount extracted in the recognition processing for the next sampling image.
Next, a first embodiment of the present disclosure will be described.
The recognition processing section 20b includes a preprocessing section 210 and a recognition section 220. Image data supplied from the sensor section 10b to the recognition processing section 20b is input to the preprocessing section 210. The preprocessing section 210 performs subsampling on the input image data by thinning out pixels according to a predetermined rule. A sampling image obtained by performing subsampling on the image data is input to the recognition section 220.
The recognition section 220 performs the recognition processing on the image data by using the DNN, similarly to the recognition processing section 20a in
The recognition section 220 outputs a recognition result obtained by the recognition processing to, for example, the outside of the information processing device 1b.
A sampling image including sampling pixels obtained by subsampling is input to the recognition section 220. The recognition section 220 extracts a feature amount of the input sampling image by the DNN (Step S11). Here, the recognition section 220 extracts the feature amount by using the CNN as a type of DNN.
The recognition section 220 stores the feature amount extracted in Step S11 in an accumulation section (for example, the RAM 1207) (not illustrated). At this time, for example, in a case where the feature amount extracted in the immediately previous frame has already been stored in the accumulation section, the recognition section 220 recursively uses the feature amount stored in the memory to integrate the extracted feature amount with the feature amount stored in the memory (Step S12). The recognition section 220 stores, accumulates, and integrates the feature amounts extracted up to the immediately previous frame in the accumulation section. That is, the processing in Step S12 corresponds to processing using the RNN as a type of DNN.
The recognition section 220 performs the recognition processing on the basis of the feature amounts accumulated and integrated in Step S12 (Step S13).
Here, the subsampling processing performed by the preprocessing section 210 in Step S10 will be described in more detail.
The preprocessing section 210 sets a pixel position for selecting a sampling pixel by subsampling from the respective pixels 300 included in the divided region 35 for the divided region 35. Furthermore, the preprocessing section 210 sets different pixel positions for each frame as pixel positions for selecting sampling pixels.
Section (b) in
The preprocessing section 210 generates, as a sampling image including sampling pixels, an image including the respective pixels 300sa1 to 300sa4 selected as sampling pixels in a certain frame. Section (c) in
The recognition processing according to the first embodiment will be described more specifically with reference to
Note that, in each of
In Section (a) of
As depicted in Section (b), the preprocessing section 210 generates a sampling image 36Φ1 of a first phase by using the respective pixels 300sa1 to 300sa4 obtained by subsampling. The generated sampling image 36Φ1 is input to the recognition section 220.
The recognition section 220 extracts a feature amount 50a of the input sampling image 36Φ1 by using the DNN (Step S11). The recognition section 220 stores and accumulates the feature amount 50a extracted in Step S11 in the accumulation section (Step S12). In a case where the feature amount is already accumulated in the accumulation section, the recognition section 220 can accumulate the feature amount 50a in the accumulation section and can integrate the feature amount 50a with the already accumulated feature amount. Section (b) in
The recognition section 220 performs the recognition processing on the basis of the feature amount 50a accumulated in the accumulation section (Step S13). In the example of
In Section (a) of
As depicted in Section (b), the preprocessing section 210 generates a sampling image 36Φ2 of a second phase by using the respective sampling pixels obtained by subsampling in Step S10b. The generated sampling image 36Φ2 is input to the recognition section 220.
The recognition section 220 extracts a feature amount 50b of the input sampling image 36Φ2 by using the DNN (Step S11). The recognition section 220 stores and accumulates the feature amount 50b extracted in Step S11 in the accumulation section (Step S12). In this example, as shown as Step S12 in Section (b), the feature amount 50a extracted from the sampling image 36Φ1 of the first phase is already accumulated in the accumulation section. Therefore, the recognition section 220 accumulates the feature amount 50b in the accumulation section, and integrates the feature amount 50b with the accumulated feature amount 50a.
The recognition section 220 performs the recognition processing on the basis of a feature amount obtained by integrating the feature amount 50a and the feature amount 50b (Step S13). In the example of
In Section (a) of
As depicted in Section (b), the preprocessing section 210 generates a sampling image 36Φ3 of a third phase by using the respective samplings obtained by subsampling in Step S10c. The generated sampling image 36Φ3 is input to the recognition section 220.
The recognition section 220 extracts a feature amount 50c of the input sampling image 36Φ3 by using the DNN (Step S11). The recognition section 220 stores and accumulates the feature amount 50c extracted in Step S11 in the accumulation section (Step S12). In this example, as shown as Step S12 in Section (b), the feature amounts 50a and 50b extracted from the sampling images 36Φ1 and 36Φ2 of the first and second phases, respectively, are already accumulated in the accumulation section. Therefore, the recognition section 220 accumulates the feature amount 50c in the accumulation section, and integrates the feature amount 50c with the accumulated feature amounts 50a and 50b.
The recognition section 220 performs the recognition processing on the basis of a feature amount obtained by integrating the feature amount 50a and the feature amount 50b and the feature amount 50c (Step S13). In the example of
In Section (a) of
As depicted in Section (b), the preprocessing section 210 generates a sampling image 36Φ4 of a fourth phase by using the respective samplings obtained by subsampling in Step S10d. The generated sampling image 36Φ4 is input to the recognition section 220.
The recognition section 220 extracts a feature amount 50d of the input sampling image 36Φ4 by using the DNN (Step S11). The recognition section 220 stores and accumulates the feature amount 50d extracted in Step S11 in the accumulation section (Step S12). In this example, as shown as Step S12 in Section (b), the feature amounts 50a to 50c respectively extracted from the sampling images 36Φ1 to 36Φ3 of the first to third phases are already accumulated in the accumulation section. Therefore, the recognition section 220 accumulates the feature amount 50d in the accumulation section, and integrates the feature amount 50d with the accumulated feature amounts 50a to 50c.
The recognition section 220 performs the recognition processing on the basis of a feature amount obtained by integrating the feature amounts 50a to 50c and the feature amount 50d (Step S13). In the example of
By the processing of
Once subsampling and recognition processing for one cycle is completed, subsampling and recognition processing for the next cycle are started.
That is, in Section (a) of
The recognition section 220 extracts a feature amount 50a′ of the input sampling image 36Φ1′ by using the DNN (Step S11). The recognition section 220 stores and accumulates the feature amount 50a′ extracted in Step S11 in the accumulation section (Step S12). In this example, as shown as Step S12 in Section (b), the feature amounts 50a to 50d respectively extracted from the sampling images 36Φ1 to 36Φ4 of the first to fourth phases in the immediately previous cycle are already accumulated in the accumulation section. Therefore, the recognition section 220 accumulates the feature amount 50a′ in the accumulation section, and integrates the feature amount 50a′ with the accumulated feature amounts 50a to 50d.
Alternatively, the recognition section 220 may reset the accumulation section for each cycle of selection of pixel positions of sampling pixels. The accumulation section can be reset, for example, by deleting the feature amounts 50a to 50d for one cycle accumulated in the accumulation section from the accumulation section.
Furthermore, the recognition section 220 can always accumulate a certain amount of feature amounts in the accumulation section. For example, the recognition section 220 accumulates the feature amounts for one cycle, that is, the feature amounts for four frames in the accumulation section. Here, in a case where the new feature amount 50a′ is extracted, the recognition section 220 deletes, for example, the oldest feature amount 50d among the feature amounts 50a to 50d accumulated in the accumulation section, and stores and accumulates the new feature amount 50a′ in the accumulation section. The recognition section 220 performs the recognition processing on the basis of an accumulation amount obtained by integrating the feature amounts 50a to 50c remaining after deleting the feature amount 50d and the new feature amount 50a′.
The recognition section 220 performs the recognition processing on the basis of the feature amount obtained by integrating the feature amounts 50a to 50d already accumulated in the accumulation section and the newly extracted feature amount 50a′ (Step S13). In the example of
Here, the sampling image 36 is a thinned image obtained by thinning out pixels from the original image data 32. In the example of
Furthermore, the pixel positions of the pixels 300 to be set as the sampling pixels in order to generate the sampling image 36 are selected so as to be shifted by one pixel for each frame in the divided region 35. Therefore, it is possible to obtain the sampling image 36 whose phase is shifted by one pixel for each frame. Furthermore, at this time, the pixel positions of all the pixels 300 included in the divided region 35 are selected as the pixel positions of the pixels 300 to be set as the sampling pixels.
In this way, the pixel positions of the pixels 300 for generating the sampling image 36 are selected, and the feature amounts calculated from the respective sampling images 36 are accumulated and integrated. As a result, the pixels 300 at all the pixel positions included in the image data 32 can be involved in the recognition processing, and for example, a distant target object can be easily recognized.
Note that, in the above description, the preprocessing section 210 sets pixel positions for selecting sampling pixels according to a predetermined rule, but the present disclosure is not limited to this example. For example, the preprocessing section 210 may set pixel positions for selecting sampling pixels according to an instruction from the outside of the recognition processing section 20b or the outside of the information processing device 1b including the recognition processing section 20b.
Next, the subsampling processing in the recognition processing according to the first embodiment will be described more specifically.
In Section (a) of
The preprocessing section 210 selects, as a sampling pixel, the pixel 300 having the coordinates [1,1] in each divided region 35 for the image data 32a at time T-3 (Step S10a), and the recognition section 220 extracts the feature amount of the sampling image 36ϕ1 including the selected sampling pixel (Step S11). The recognition section 220 integrates the feature amount 50a extracted from the sampling image 36ϕ1 with, for example, a feature amount extracted in the previous predetermined period (Step S12), and performs the recognition processing on the basis of the integrated feature amount (Step S13).
Here, for example, the sampling image 36ϕ1 obtained by uniformly thinning the image data 32a can be obtained by the subsampling processing (Step S10a) for each divided region 35 of the image data 32a described above. The recognition processing for the entire image data 32a can be performed using the feature amount 50a extracted from the sampling image 36ϕ1 in Step S11. The recognition processing for the image data 32 can be completed by the recognition processing for the sampling image including the sampling pixel selected from the image data 32 by the subsampling.
A series of processing of generating a sampling image from the image data 32, extracting a feature amount from the generated sampling image, and performing the recognition processing on the basis of the extracted feature amount is referred to as one-unit processing. In the example of
Thereafter, similarly, the recognition processing section 20b performs the above-described one-unit processing for each of the pieces of image data 32b, 32c, and 32d sequentially updated in the frame cycle, and performs the recognition processing. At this time, the feature amount integration processing of Step S12 and the recognition processing of Step S13 can be common to each unit processing.
By performing the one-unit processing on each of the pieces of image data 32a to 32d described above, the selection of a sampling pixel for each pixel position included in each divided region 35 is completed once.
In this example, the feature amount 50d extracted on the basis of the oldest image data 32d is discarded, and the feature amount 50a′ is extracted from the new image data 32a′. That is, the preprocessing section 210 selects each pixel 300 having the coordinates [1,1] in each divided region 35 of the image data 32a′ as a sampling pixel, and generates the sampling image 36ϕ1. The recognition section 220 extracts the feature amount 50a′ from the sampling image 36ϕ1 selected from the image data 32a′. The recognition section 220 integrates the feature amount 50a′ and the feature amounts 50a, 50b, and 50c extracted so far, and performs the recognition processing on the basis of the integrated feature amount. In this case, it is sufficient if the recognition section 220 performs the feature amount extraction processing only on the newly acquired image data 32a′.
As described above, the recognition processing according to the first embodiment is performed by performing the one-unit processing in the same processing system in the recognition processing section 20b. More specifically, the recognition processing section 20b repeats a processing system including the subsampling processing and the feature amount extraction processing for the image data 32 for each frame as the one-unit processing, integrates the feature amounts extracted by the repetition, and performs the recognition processing.
Furthermore, the recognition processing section 20b performs the subsampling processing for the pixel positions of all the pixels 300 included in the image data 32 while periodically shifting the pixel position for selecting the sampling pixel. In addition, the recognition processing section 20b performs the recognition processing by integrating the feature amounts as the intermediate data extracted from the sampling image including the sampling pixels selected from the image data 32 of each frame in Step S11.
Since the recognition processing according to the first embodiment configured as described above is a processing system that can be completed by one-unit processing, a recognition result can be obtained more quickly. In addition, since sampling pixels are selected from the entire image data 32 in one-unit processing, a wide range of recognition results can be confirmed by one-unit processing. Furthermore, since pieces of intermediate data (feature amounts) based on a plurality of pieces of image data 32 are integrated, it is possible to acquire a more detailed recognition result that may be acquired by performing the processing multiple times.
That is, by using the information processing device 1b according to the first embodiment, it is possible to achieve both improvement of the simultaneity of a recognition result and acquisition of a recognition result based on a high resolution of a captured image, and it is possible to improve the characteristics of the recognition processing using a captured image.
Next, a more specific configuration example for implementing the recognition processing according to the first embodiment will be described.
For example, in imaging processing for the frame #1, exposure is performed for a predetermined time in each line, and after the exposure ends, a pixel signal is transferred from each pixel circuit 1000 to the AD conversion section 1003 via the vertical signal line VSL, and each AD converter 1007 in the AD conversion section 1003 converts the transferred analog pixel signal into pixel data that is a digital signal. Once the conversion from a pixel signal into pixel data is performed for all the lines, the image data 32a based on the pixel data of the frame #1 is input to the preprocessing section 210. The preprocessing section 210 performs the subsampling processing (indicated as “SS” in the drawing) as described above on the input image data 32a, acquires the pixel 300 from the pixel position of the sampling pixel selected for each divided region 35, and generates the sampling image 36ϕ1 (Step S10a).
The preprocessing section 210 passes the sampling image 36ϕ1 to the recognition section 220. At this time, the sampling image 36ϕ1 passed from the preprocessing section 210 to the recognition section 220 is an image which is thinned out by the subsampling processing and of which the number of pixels is reduced as compared with the image data 32a. The recognition section 220 performs the recognition processing on the sampling image 36ϕ1. Here, the feature extraction processing (Step S11), the feature amount integration processing (Step S12), and the recognition processing (Step S13) are illustrated as being included in the recognition processing. The processings of Steps S11 to S13 are performed, for example, within a period of one frame. A recognition result ϕ1 based on the sampling image 36ϕ1 is output to the outside of the recognition processing section 20b.
In parallel with the above-described processing for the frame #1, processing for the next frame #2 is performed. The image data 32b including the pixel data of the frame #2 is input to the preprocessing section 210. The preprocessing section 210 performs the subsampling processing on the input image data 32b at a phase different from that of the image data 32a to generate the sampling image 36ϕ2.
The preprocessing section 210 passes, to the recognition section 220, the sampling image 36ϕ2 of which the number of pixels is reduced as compared with the image data 32b by the subsampling. The recognition section 220 performs the recognition processing on the sampling image 36ϕ2 within a period of one frame.
At this time, the recognition section 220 integrates the feature amount 50b extracted from the sampling image 36ϕ2 and the feature amount 50a extracted by the feature amount extraction processing for the image data 32a by the feature amount integration processing in Step S12. The recognition section 220 performs the recognition processing by using the integrated feature amount. A recognition result ϕ2 obtained by the recognition processing is output to the outside of the recognition processing section 20b.
Thereafter, similarly, the preprocessing section 210 performs the subsampling processing on the next frame #3 in parallel with the processing for the image data 32b of the immediately previous frame #2, and the recognition section 220 extracts the feature amount 50c from the sampling image 36ϕ3 generated by the subsampling processing. The recognition section 220 further integrates the feature amount obtained by integrating the feature amounts 50a and 50b extracted from the image data 32a and 32b, respectively, and the extracted feature amount 50c, and performs the recognition processing on the basis of the integrated feature amount. The recognition section 220 outputs a recognition result ϕ3 obtained by the recognition processing to the outside.
Similarly, the recognition processing section 20b performs the subsampling processing and the feature amount extraction processing on the next frame #4 in parallel with the processing for the image data 32c of the immediately previous frame #3, and acquires the feature amount 50d. The recognition processing section 20b further integrates the feature amount obtained by integrating the feature amounts 50a to 50c extracted from the image data 32a to 32c, respectively, and the extracted feature amount 50d by the recognition section 220, and performs the recognition processing on the basis of the integrated feature amount. The recognition section 220 outputs a recognition result ϕ4 obtained by the recognition processing to the outside.
Here, in
More specifically, in the example of
On the other hand, the information amount of each of the recognition results ϕ1 to ϕ4 obtained by the recognition processing based on each of the pieces of image data 32a to 32d increases every time the recognition processing is repeated, which indicates that the obtained recognition result becomes more detailed every time the recognition processing is performed. This is because the feature amount obtained by integrating the feature amount acquired while shifting the phase of the sampling image immediately before and the feature amount newly acquired by further shifting the phase with respect to the sampling image so far is used for each recognition processing.
Next, more detailed functions of the preprocessing section 210 and the recognition section 220 according to the first embodiment will be described.
The reading section 211, the use region acquisition section 212, the feature amount calculation section 221, the feature amount accumulation control section 222, the feature amount accumulation section 223, and the use region determination section 224 are implemented by, for example, an information processing program operating on the CPU 1205. This information processing program can be stored in the ROM 1206 in advance. Alternatively, the information processing program can also be supplied from the outside via the interface 1204 and written in the ROM 1206.
Furthermore, the reading section 211, the use region acquisition section 212, the feature amount calculation section 221, the feature amount accumulation control section 222, the feature amount accumulation section 223, and the use region determination section 224 may be implemented by the CPU 1205 and the DSP 1203 operating in accordance with the information processing program. Furthermore, some or all of the reading section 211, the use region acquisition section 212, the feature amount calculation section 221, the feature amount accumulation control section 222, the feature amount accumulation section 223, and the use region determination section 224 may be implemented by hardware circuits that operate in cooperation with each other.
In the preprocessing section 210, the reading section 211 reads the image data 32 from the sensor section 10b. The reading section 211 passes the image data 32 read from the sensor section 10b to the use region acquisition section 212. The use region acquisition section 212 performs the subsampling processing on the image data 32 passed from the reading section 211 according to information indicating a use region passed from the use region determination section 224 described later, and extracts sampling pixels. The use region acquisition section 212 generates a sampling image 36ϕx of a phase ϕx from the extracted sampling pixels.
The use region acquisition section 212 passes the generated sampling image 36ϕx to the recognition section 220. The passed sampling image 36ϕx is passed from the recognition section 220 to the feature amount calculation section 221.
In the recognition section 220, the feature amount calculation section 221 calculates a feature amount on the basis of the passed sampling image 36ϕx. That is, the feature amount calculation section 221 functions as a calculation section that calculates the feature amount of the sampling image 36ϕx including the sampling pixels. Alternatively, the feature amount calculation section 221 may acquire information for setting exposure and an analog gain from the reading section 211, for example, and calculate the feature amount by further using the acquired information. The feature amount calculation section 221 passes the calculated feature amount to the feature amount accumulation control section 222.
The feature amount accumulation control section 222 accumulates the feature amount transferred from the feature amount calculation section 221 in the feature amount accumulation section 223. At this time, the feature amount accumulation control section 222 can integrate the past feature amount already accumulated in the feature amount accumulation section 223 and the feature amount passed from the feature amount calculation section 221 to generate an integrated feature amount. Furthermore, in a case where the feature amount accumulation section 223 is initialized and there is no feature amount, for example, the feature amount accumulation control section 222 accumulates, as the first feature amount, the feature amount transferred from the feature amount calculation section 221 in the feature amount accumulation section 223.
In addition, the feature amount accumulation control section 222 can delete a feature amount satisfying a predetermined condition from among the feature amounts accumulated in the feature amount accumulation section 223. The feature amount accumulation control section 222 can apply time information, an external instruction, an exposure condition, and the like as conditions for deleting the feature amount.
For example, in a case where the time information is applied as the condition for deleting the feature amount, the feature amount accumulation control section 222 can delete a feature amount based on a sampling image obtained by the first subsampling in the subsampling of the immediately previous cycle among the feature amounts accumulated in the feature amount accumulation section 223.
Furthermore, in a case where the imaging section 1200 has an automatic exposure setting function, the feature amount accumulation control section 222 can determine that the scene of the captured image has been changed in a case where a predetermined level or more of change in exposure is detected, and can delete the feature amounts accumulated in the feature amount accumulation section 223 so far. Furthermore, the feature amount accumulation control section 222 can delete all the feature amounts accumulated in the feature amount accumulation section 223 and initialize the feature amount accumulation section 223, for example, according to an instruction from the outside.
Note that the condition for deleting the feature amount accumulated in the feature amount accumulation section 223 by the feature amount accumulation control section 222 is not limited to each condition described above.
The feature amount accumulation control section 222 passes the feature amount accumulated in the feature amount accumulation section 223 or the feature amount obtained by integrating the feature amount accumulated in the feature amount accumulation section 223 and the feature amount passed from the feature amount calculation section 221 to the use region determination section 224 and a recognition processing execution section 225. The recognition processing execution section 225 performs the recognition processing on the basis of the feature amount passed from the feature amount accumulation control section 222. The recognition processing execution section 225 performs object detection, person detection, face detection, and the like by the recognition processing. The recognition processing execution section 225 outputs the recognition result obtained by the recognition processing to the outside of the recognition processing section 20b.
Here, in a case where a predetermined condition is satisfied, the feature amount accumulation control section 222 can pass the feature amount accumulated in the feature amount accumulation section 223 to the use region determination section 224 and the recognition processing execution section 225. For example, the feature amount accumulation control section 222 can apply time information, an external instruction, an exposure condition, and the like as conditions for passing the feature amount to the use region determination section 224 and the recognition processing execution section 225.
For example, in a case where the time information is applied as the condition for accumulating the feature amount, the feature amount accumulation control section 222 can integrate the feature amount newly passed to the feature amount accumulation control section 222 and the feature amount already accumulated in the feature amount accumulation section 223, and passes the integrated feature amount to the use region determination section 224 and the recognition processing execution section 225.
Furthermore, in a case where the imaging section 1200 has the automatic exposure setting function, the feature amount accumulation control section 222 can determine that the scene of the captured image has been changed in a case where a predetermined level or more of change in exposure is detected, and can pass only the feature amount newly passed to the feature amount accumulation control section 222 to the use region determination section 224 and the recognition processing execution section 225. At this time, as described above, the feature amount accumulation control section 222 can delete the feature amounts already accumulated in the feature amount accumulation section 223. Furthermore, for example, according to an instruction from the outside, the feature amount accumulation control section 222 can select a feature amount according to the instruction from the outside from among a newly passed feature amount and one or more feature amounts already accumulated in the feature amount accumulation section 223 and pass the selected feature amount to the use region determination section 224 and the recognition processing execution section 225.
Note that the conditions for the feature amount accumulation control section 222 to pass the feature amount to the use region determination section 224 and the recognition processing execution section 225 are not limited to the above-described conditions.
The use region determination section 224 determines a pixel position for reading pixel data as a sampling pixel from the image data 32 read by the reading section 211. The use region determination section 224 determines the pixel position according to, for example, a predetermined pattern and timing. Alternatively, the use region determination section 224 can also decide a pixel position on the basis of the feature amount passed from the feature amount accumulation control section 222. The use region determination section 224 passes information indicating the determined pixel position to the preprocessing section 210 as use region information, and the use region information is input to the use region acquisition section 212.
As described above, the use region acquisition section 212 performs the subsampling processing on the image data 32 passed from the reading section 211 according to the information indicating the use region passed from the use region determination section 224. That is, the use region acquisition section 212 or the preprocessing section 210, and the use region determination section 224 described above function as a setting section that sets a sampling pixel for the divided region 35 obtained by dividing the image data 32, that is, imaging information including pixels.
Next, effects of the recognition processing according to the first embodiment will be described in comparison with existing technologies.
Here, a case where an information processing device that performs the recognition processing based on a captured image is used for in-vehicle use will be considered. In this case, there is a demand for recognition of a distant target, and a resolution of 1920 pixels × 1080 pixels or more (for example, 4096 pixels × 2160 pixels with 4K resolution) may be required in consideration of the angle of view of a camera, an installation position, a recognition target, the moving speed of a vehicle, and the like. Meanwhile, a processing speed of a recognizer that performs the recognition processing is limited, and it is thus difficult to process a high-resolution image as it is. In a case where the resolution is, for example, several 100 pixels × several 100 pixels, it is conceivable that the recognition processing can be performed with high simultaneity with respect to the captured image, but in this case, as described with reference to
Therefore, conventionally, a method of reducing a calculation workload for the recognition processing has been proposed. An example of a method of reducing the calculation workload of the recognition processing according to the related art will be described with reference to
Next, the technology according to an embodiment of the present disclosure is compared with the existing technologies, and effects of the technology according to an embodiment of the present disclosure will be described.
Furthermore, in
“1/N × 1/N size reduction” will be described. The “distance” is “×”. This is because the number of pixels of “1/N × 1/N size reduction” is 1/N in each of the row direction and the column direction. On the other hand, the “view angle” and the “cropping out” are “o” because the entire original image 320 is thinned out and used. Furthermore, the “latency” and the “frame rate” are “o” because the number of pixels is reduced as compared with the original image 320 and high-speed processing is possible.
The method “1/N × 1/N phase shift subsampling + information integration” according to the first embodiment will be described. The “distance” is equivalent to that of the original image 320 in a case where the recognition processing for one cycle of the phase is performed, and is “o”. The “angle of view” and the “cropping out” are “o” because the entire original image 320 is used. Furthermore, the “latency” and the “frame rate” are “o” because the recognition result can be output in each phase.
“1/N × 1/N cropping” will be described. The “distance” is “o” because the resolution of each of cropped images #1 to #4 is equivalent to that of the original image 320. On the other hand, since N2 frames are required to view the entire angle of view of the original image 320, the “angle of view” is 1/N times per frame, which is “×”. As for the “cropping out”, since each of the cropped images #1 to #4 is obtained by dividing the original image 320, there is a possibility that cropping out may occur at a division position, and thus, the “cropping out” is “×”. As for the “latency”, similarly to the “angle of view”, N2 frames are required to view the entire angle of view of the original image 320, and thus, the “latency” is “×”. Furthermore, the “frame rate” is 1/N2 times because N2 frames are required to view the entire angle of view of the original image 320, and thus, the “frame rate” is “×”.
Note that the “bus width” is “×” in any case since the captured image is output from the imaging section 1200 with the resolution of the original image 320.
In the evaluation example depicted in
Here, in “1/N × 1/N size reduction” depicted in
Therefore, the resolution in one frame is the same between a case of “1/N × 1/N size reduction” depicted in
First, a case of “1/N × 1/N size reduction” will be described. In
The size-reduced image 321b is generated from the captured image at the next frame timing. Similarly to the case of the size-reduced image 321a, the recognition results 62 and 63 are obtained by the recognition processing based on the feature amount extracted from the size-reduced image 321b. Similarly, the recognition results 62 and 63 are obtained also for the size-reduced image 321c at the next frame timing and the size-reduced image 321d at the further next frame timing by the recognition processing based on the extracted feature amount. As described above, in a case of “1/N × 1/N size reduction”, only a target object positioned at a distance according to the resolution of each of the size-reduced images 321a, 321b, and the like can be recognized.
Next, a case of “1/N × 1/N phase shift subsampling + information integration” will be described. In
The sampling image 36ϕ2 of the second phase is generated from the captured image at the next frame timing, the phase of the sampling image 36ϕ2 being shifted from the sampling image 36ϕ1 by one pixel. The recognition processing is performed on the basis of the feature amount extracted from the sampling image 36ϕ2.
At this time, the recognition processing is performed on the basis of a feature amount obtained by integrating the feature amount extracted from the sampling image 36ϕ2 and the feature amount extracted from the sampling image 36ϕ1 used in the immediately previous recognition processing. As a result, in addition to the recognition results 62 and 63 based on the sampling image 36ϕ1, a recognition result 64 in which a person positioned farther than the person recognized as the recognition results 62 and 63 is recognized is obtained.
Also for the sampling image 36ϕ3 at the further next frame timing, the feature amount based on the sampling image 36ϕ3 is integrated with the feature amount used in the immediately previous recognition processing, and a recognition result 65 in which a person positioned farther than the person recognized as the above-described recognition result 64 is recognized is obtained. Similarly, also for the sampling image 36ϕ4 at the still further next frame timing, the feature amount based on the sampling image 36ϕ4 is integrated with the feature amount used in the immediately previous recognition processing, and a recognition result 66 in which a person positioned farther than the person recognized as the above-described recognition result 65 is recognized is obtained.
As described above, in the first embodiment, the respective feature amounts based on the sampling images 36ϕ1 to 36ϕ4 of a plurality of frames subsampled are sequentially integrated for each frame, and the recognition processing is performed on the basis of the integrated feature amount. Therefore, for example, in a case where the images of the respective frames from which the sampling images 36ϕ1 to 36ϕ4 are generated has temporal continuity, it can be considered that the feature amount obtained by integrating the feature amounts extracted from the sampling images 36ϕ1 to 36ϕ4 corresponds to the feature amount extracted from one captured image for which the subsampling is not performed.
Therefore, with the recognition processing according to the first embodiment, it is possible to recognize a distant target object by fully utilizing the resolution of the camera (imaging section 1200). In addition, since a recognition result can be obtained by performing the recognition processing for each frame, a large target object can be recognized on an image in a short time. A small target object on an image is recognized with an N2 frame latency, for example, but since such a target object is predicted to be far away, a slight latency in the recognition result can be allowed.
Next, control of the recognition processing according to the existing technology will be described and compared with control of the reading and recognition processing according to the first embodiment described with reference to
In the example of
In addition, since the recognition processing for the image data 32a that is not reduced in size requires a lot of time, an update interval of the recognition result becomes long, and the frame rate for the recognition result decreases. Furthermore, in a case where the recognition processing is performed on the basis of the image data 32a that is not reduced in size, for example, it is necessary to secure a large memory capacity for the image data 32a. Furthermore, in
In the example of
On the other hand, since the recognition processing is performed on the basis of the size-reduced image obtained by thinning the image data 32a, the resolution of the image data 32a cannot be utilized, and it is difficult to recognize a distant target object. Similar processing applies to a recognition result #b, a recognition result #c, and a recognition result #d of the image data 32b, 32c, and 32d of the frames #2, #3, and #4, respectively.
On the other hand, in the recognition processing according to the first embodiment described with reference to
Therefore, it is possible to recognize a distant target object by fully utilizing the resolution of the camera (imaging section 1200). In addition, since a recognition result can be obtained by performing the recognition processing for each frame, a large target object can be recognized on an image in a short time. For example, A small target object, which can be recognized by fully using the resolution of a captured image, on the image is recognized with an N2 frame latency, for example, but since such a target object is predicted to be far away, a slight latency in the recognition result can be allowed.
Next, a latency of the recognition processing according to the existing technology and a latency of the recognition processing according to the first embodiment will be described.
In
Note that, here, an example in which the recognition processing is performed using the resolution of the captured image as it is without thinning or the like in consideration of recognition of a distant target object is applied as the existing technology in Section (c). Furthermore, the frame rate of the captured image is, for example, 20 [fps] or more.
It is assumed that the captured images 3101, 3102, ..., 3109, and the like depicted in Section (a) of
In Section (a) of
In the example of
For the target object 43 included in the captured image 3101, in the recognition processing using the phase shift subsampling depicted in Section (b), as illustrated in the image 3111, a recognition result 70 in which the target object 43 is recognized is obtained at time t11 delayed by 0.05 [s] from time t1 at which the captured image 3101 is acquired. In the recognition processing using the subsampling, the recognition result is updated at intervals of 0.05 [s] even thereafter.
On the other hand, in the recognition processing using the resolution of the captured image depicted in Section (c), for the target object 43, the recognition result 70 based on the captured image 3101 is obtained at time t20 delayed by 0.5 [s] from time t1 as shown in the image 3121. In the recognition processing using the resolution of the captured image, the recognition result is updated next at time t21 after 0.5 [s] from time t20.
Next, the captured image 3105 in which the target object 44 that is a person appears from behind the vehicle 45 will be considered. In this case, in the recognition processing using the phase shift subsampling depicted in Section (b), a recognition result 71 in which the target object 44 is recognized is obtained at time t12 after 0.05 [s] from time t5 at which the captured image 3105 is acquired as shown in the image 3112. Furthermore, in the recognition processing using the phase shift subsampling, the recognition result is updated every 0.05 [s], and the target object 44 moving toward the host vehicle is obtained as a recognition result 72 at time t13 after 0.5 [s] from time t12 as shown in the image 3113.
On the other hand, in the recognition processing using the resolution of the captured image depicted in Section (c), the target object 44 is recognized at time t22 after 0.5 [s] from time t5 at which the captured image 3105 is acquired, and the recognition result 71 corresponding to the target object 44 is obtained. That is, in the recognition processing using the resolution of the captured image, the target object 44 is recognized with a latency of 0.45 [s] as compared with the recognition processing using the phase shift subsampling.
Furthermore, in the recognition processing using the resolution of the captured image depicted in Section (c), the recognition result is updated only once at time t21 between time t20 and time t22, and it is extremely difficult to confirm the state of the target object 44 with high simultaneity.
A relationship between the recognition processing and braking of the vehicle will be schematically described with reference to
In the recognition processing using the resolution of the captured image, a recognition result based on the captured image 3103 at time ts before 0.5 [s] due to the latency is obtained at time t5 (= time t22). Since the captured image 3103 includes the target object 43 but does not include the target object 44, only the recognition result 70 for the target object 43 is obtained as shown in the image 3122 in the upper part of
At time t7, the target object 44 approaches the host vehicle so as to be positioned at a short distance B. Here, it is assumed that the distance B is a distance at which there is an extremely high possibility that the host vehicle comes into contact with or collides with the target object 44 even if braking of the host vehicle such as deceleration or putting a brake is made. For example, in a case where the speed of the host vehicle is 30 [km/h], the host vehicle moves by about 4.2 [m] within 0.5 [s]. In a case where the moving speed of the target object 44 is ignored, the distance B is a distance shorter than the distance A by about 4.2 [m].
In the recognition processing using the resolution of the captured image, a recognition result using the captured image 3105 at time t5 before 0.5 [s] corresponding to the latency is obtained at time t7. That is, the recognition result 71 for the target object 44 at the distance A is obtained at time t7 as shown in the image 3123 on the right side in the lower part of
Similarly to the above, in a case where the speed of the host vehicle is 30 [km/h], the host vehicle moves by about 40 [cm] within 0.05 [s]. Therefore, in a case where the moving speed of the target object 44 is ignored, the distance A′ is a distance shorter than the distance A by about 40 [cm]. In this case, it is possible to avoid a situation in which the host vehicle comes into contact with or collides with the target object 44 by braking of the host vehicle such as deceleration or putting a brake.
As described above, with the recognition processing according to the first embodiment (the recognition processing using the phase shift subsampling), it is possible to recognize a target object more quickly, and for example, it is possible to more reliably perform an avoidance operation by braking of the vehicle.
Next, an example of improving the recognition processing of the existing technology by the recognition processing according to the first embodiment will be described with reference to
Note that the “¼ size reduction” corresponds to “1/N × 1/N size reduction” described with reference to
Furthermore, in
The “CNN resolution” is a resolution at the time of extracting the feature amount using the CNN in the recognition section 220. In the example of
On the other hand, in the “phase shift subsampling”, since every three pixel is thinned out in the row and column directions, the “CNN resolution” is set to the resolution of 480 pixels × 270 pixels that is ¼ of the “camera resolution”. Here, in the first embodiment, all the pixel positions of the divided region 35 having a size of 8 pixels × 8 pixels are selected as the pixel positions of the sampling pixels in one cycle. Therefore, the CNN resolution in a case where the subsampling of one cycle is completed corresponds to 1920 pixels × 1080 pixels that is 480 pixels × 270 pixels × 4 × 4.
The “longest recognition distance” is the longest distance from the imaging section 1200 to a target object at which the target object can be recognized. In the example of
The “latency” indicates a latency of the recognition result for the captured image, and the “latency” for the “¼ size reduction” is 50 [ms], and the “latency” for the “no size reduction” is 800 [ms] that is 16 times (= 4 × 4) that for the “¼ size reduction”. On the other hand, the “latency” for the “phase shift subsampling” is 50 [ms] in one subsampling, and is 800 [ms] that is equivalent to that for the “no size reduction” in a case where the subsampling for one cycle is completed. Note that, in the “phase shift subsampling”, a symbol “@ (at mark)” is followed by the longest recognition distance. By doing so, in one subsampling, the latency is 50 [ms] which is short, while the longest recognition distance is 20 [m] which is short. Furthermore, in a case where the subsampling for one cycle is completed, the longest recognition distance increases to 80 [m].
The “frame rate” indicates a recognition result update cycle. In the example of
The “idle running distance” is a distance by which the host vehicle travels until braking such as putting a brake for actually stopping the host vehicle is performed after a target object for which the host vehicle needs to be stopped appears, for example. More specifically, the “idle running distance” is a distance by which the host vehicle travels from a time point at which the target object appears to a time point at which the appearing target object is recognized by the recognition processing and braking of the host vehicle is started according to the recognition result. Here, the idle running distance is obtained on the basis of the traveling speed of the host vehicle set to 50 [km/h]. In order to obtain the “idle running distance”, it is necessary to consider the value of the “latency” described above.
The “idle running distance” is 0.7 [m] for the “¼ size reduction” and is 11 [m] for the “no size reduction”. As described above, the “latency” for the “no size reduction” is 16 times that for the “¼ size reduction”. Therefore, the “idle running distance” is also 11 [m] that is approximately 16 times 0.7 [m]. Further, in the “¼ size reduction”, while the idle running distance is short, the longest recognition distance is 20 [m], and it is difficult to recognize a target object positioned at a distance exceeding 20 [m]. Further, in the “no size reduction”, a target object positioned at a distance of up to 80 [m] can be recognized, while the idle running distance is 11 [m] which is long. In a case of the “no size reduction”, the idle running distance is 11 [m] even for a target object positioned at a short distance of, for example, up to 20 [m], and there is a possibility that it becomes difficult to avoid contact with or collision with the target object.
On the other hand, in the “phase shift subsampling”, a target object positioned at a distance of up to 20 [m] can be recognized with a latency of 50 [ms] in the first one subsampling for one cycle. Therefore, the “idle running distance” for a target object positioned at a distance of up to 20 [m] is 0.7 [m], which is equivalent to that for the “¼ size reduction”. In addition, the “idle running distance” for a target object positioned at a distance of more than 20 [m] and up to 80 [m] is 11 [m], which is equivalent to that for the “no size reduction”.
The “longest recognition distance (pedestrian)” indicates the longest recognition distance for a pedestrian or the like in a case where braking of the vehicle is needed when the pedestrian or the like is recognized. The “longest recognition distance (pedestrian)” is a value obtained by subtracting the “idle running distance” from the “longest recognition distance” described above, and is approximately 19 [m] for the “¼ size reduction”, and is approximately 69 [m] for each of the “no size reduction” and the “phase shift subsampling”.
As described above, in the “phase shift subsampling” which is the recognition processing method according to the first embodiment, a dramatic improvement from the recognition processing method of the existing technology can be seen in that it is possible to recognize a target object positioned at a long distance while decreasing the idle running distance at a short distance. Therefore, in the recognition processing according to the first embodiment, it is possible to recognize a target object positioned at a long distance, the idle running distance at a short distance is 0.7 [m] which is short, and it is easy to avoid contact and collision with a target object positioned at a short distance.
Next, an effective application of the recognition processing according to the first embodiment will be described with reference to
The state 401 schematically shows a state in which the vehicle 411b protrudes from a lane 420 and enters the traveling lane of the host vehicle 410 after a predetermined time elapses in the state 400. In this state 402, the information processing device 1b mounted on the host vehicle 410 can recognize the vehicle 411b positioned at a long distance as a target object. Therefore, the information processing device 1b can recognize the vehicle 411b protruding from the lane 420, control the host vehicle 410 according to the recognition result, and cause the host vehicle 410 to perform an avoidance operation of avoiding the vehicle 411b.
The state 402 schematically shows a case where the person 412a suddenly appears from behind the vehicle 411a after a predetermined time elapses in the state 400. In this state 402, the information processing device 1b mounted on the host vehicle 410 can recognize the person 412a positioned at a short distance as a target object with a short latency. Therefore, the information processing device 1b can recognize the person 412a and start an operation of controlling the host vehicle 410 according to the recognition result with a short idle running distance, and can avoid contact or collision with the person 412a.
The state 404 schematically shows a state in which the person 412c is recognized by the information processing device 1b of the host vehicle 410. The information processing device 1b can recognize a distant target object. Therefore, it is possible to control the host vehicle 410 so as to perform gentle deceleration in a case where the person 412c who is positioned at a long distance in the traveling direction of the host vehicle 410 is recognized.
The state 405 schematically shows a state in which the person 412b suddenly appears from behind the shielding object 413 after a predetermined time elapses in the state 403. In this state 405, the information processing device 1b mounted on the host vehicle 410 can recognize the person 412b positioned at a short distance as a target object with a short latency. Therefore, the information processing device 1b can recognize the person 412b and start an operation of controlling the host vehicle 410 according to the recognition result with a short idle running distance, and can avoid contact or collision with the person 412b.
Next, modified examples of the first embodiment will be described.
First, a first modified example of the first embodiment will be described. The first modified example of the first embodiment is an example in which a spatial interval of sampling by subsampling is changed according to a recognition target object, a situation in which the recognition processing is to be performed, or the like.
As in the example of
On the other hand, as in
For example, in a case where the information processing device 1b according to the first modified example of the first embodiment is used for in-vehicle use, it is important to recognize a target object positioned at a longer distance while the vehicle on which the information processing device 1b is mounted is traveling at a high speed. Therefore, the preprocessing section 210 decreases the sampling interval as depicted in
For example, the information processing device 1b can acquire position information indicating a current position, and determine whether or not the current position is an urban area on the basis of the acquired position information and map information corresponding to the current position. The information processing device 1b can be configured to estimate the current position by using, for example, simultaneous localization and mapping (SLAM) or global navigation satellite system (GNSS). In addition, the information processing device 1b acquires information indicating a traveling speed from the vehicle via the interface 1204. In the information processing device 1b, for example, the preprocessing section 210 can set the sampling interval on the basis of the current position information and the traveling speed information. The sampling interval can be dynamically set on the basis of these pieces of information.
Note that the present disclosure is not limited thereto, and it is also conceivable to adjust the speed of the recognition processing and the recognizable distance by a method such as controlling a clock frequency supplied to the preprocessing section 210 and the recognition section 220 or narrowing a recognition region to be subjected to the recognition processing in the captured image. For example, during high-speed traveling in an urban area, a long-distance recognition result and a short-distance recognition result are required. In such a case, control such as increasing the clock frequency and narrowing the recognition region is performed.
As described above, in the first modified example of the first embodiment, the recognition result can be obtained more stably by appropriately performing switching between the recognition processing using the subsampling and the normal recognition processing. In addition, since the recognition processing using the subsampling and the normal recognition processing are alternately performed, a load of the recognition section 220 is reduced, and power consumption can be suppressed.
Next, a second modified example of the first embodiment will be described. The second modified example of the first embodiment is an example in which an external device is controlled according to a latency of the recognition processing. As an example, in a case where the information processing device 1b according to the second modified example of the first embodiment is used for in-vehicle use, the speed (vehicle speed) of the vehicle on which the information processing device 1b is mounted is controlled according to the latency of the recognition processing. For example, a case where sampling pixels are selected as depicted in
For example, in a school zone, there are many children who are small-sized target objects on the image. Therefore, for example, in a case where the current position is determined to be a school zone on the basis of the position information, the information processing device 1b according to the second modified example of the first embodiment selects the subsampling by the sampling pixels in
Next, a third modified example of the first embodiment will be described. The third modified example of the first embodiment is an example in which the recognition processing using the subsampling according to the first embodiment (hereinafter, referred to as the recognition processing using the subsampling) and the normal recognition processing in which the subsampling is not performed are switched. Here, the normal recognition processing refers to recognition processing using all pixels of a captured image without performing the subsampling and reducing the size of the captured image.
For example, the normal recognition processing can be performed at regular time intervals unlike the recognition processing using the subsampling. As a result, for example, it is possible to verify the recognition processing using the subsampling. Furthermore, for example, in an emergency, it is possible to perform switching from the recognition processing using the subsampling to the normal recognition processing. As a result, the stability of the recognition processing can be improved.
Here, in a case where the recognition processing using the subsampling is switched to the normal recognition processing, for example, the latency is increased, and thus the immediacy of the recognition result is deteriorated. Therefore, in a case of performing switching to the normal recognition processing, it is preferable to increase a frequency of a clock to be supplied to the preprocessing section 210 and the recognition section 220.
Furthermore, it is possible to perform switching between the recognition processing using the subsampling and the normal recognition processing according to the reliability of the recognition result obtained by the recognition processing. For example, in a case where the recognition processing using the subsampling is performed, the recognition section 220 acquires the reliability of the recognition result obtained by the recognition processing. In a case where the reliability is less than a predetermined value, for example, the recognition section 220 instructs the preprocessing section 210 to perform switching from the recognition processing using the subsampling to the normal recognition processing. In response to this instruction, the preprocessing section 210 stops the subsampling for the captured image and passes all the pixels of the captured image to the recognition section 220. The recognition section 220 performs the normal recognition processing on the basis of all the pixels of the captured image passed from the preprocessing section 210.
In the normal recognition processing switched from the recognition processing using the subsampling in this manner, the recognition section 220 acquires the reliability of the recognition result obtained by the recognition processing. In a case where the reliability is, for example, equal to or more than the predetermined value, the recognition section 220 instructs the preprocessing section 210 to perform switching from the normal recognition processing to the recognition processing using the subsampling. In response to this instruction, the preprocessing section 210 performs the subsampling for the captured image and passes the selected sampling pixels to the recognition section 220. The recognition section 220 performs the recognition processing using the subsampling on the basis of the sampling pixels passed from the preprocessing section 210.
As described above, in the third modified example of the first embodiment, the recognition result can be obtained more stably by performing switching between the recognition processing using the subsampling and the normal recognition processing according to the reliability of the recognition result obtained by the recognition processing. Furthermore, in a case where the reliability of the recognition result is high, the recognition processing using the subsampling is performed, so that the load of the recognition section 220 is reduced, and the power consumption can be suppressed.
Next, a fourth modified example of the first embodiment will be described. In the fourth modified example of the first embodiment, a pixel 300 at a pixel position arbitrarily selected in a captured image is used as a sampling pixel by the subsampling.
Here, the plurality of arbitrarily selected pixel positions includes, for example, a plurality of discrete and aperiodic pixel positions. For example, the preprocessing section 210 can select a plurality of pixel positions by using a pseudo random number. Furthermore, the selected pixel positions are preferably different for each frame, but some pixel positions may overlap between frames.
The recognition section 220 selects a pixel 300 at a pixel position included in each of the patterns R#m_1, R#m_2, ..., and R#m_n as a sampling pixel from each of the original images 320Rn1, 320Rn2, ..., and 320Rnn, and performs the recognition processing on the basis of the selected sampling pixel.
In the fourth modified example of the first embodiment, as described above, the recognition processing is performed on the basis of the sampling pixels arbitrarily selected from each of the original images 320Rn1, 320Rn2, ..., and 320Rnn and selected according to the patterns R#m_1, R#m_2, ..., and R#m_n including a plurality of pixel positions different for each frame. Therefore, for example, it is possible to reduce the artifact of sampling as compared with a case where pixels 300 at pixel positions obtained by periodically selecting, for example, every other pixel or every three other pixel in each of the row direction and the column direction are used as the sampling pixels.
For example, with the recognition processing according to the fourth modified example of the first embodiment, it is possible to suppress occurrence of erroneous recognition or unrecognition of a temporal cycle pattern such as flicker. Furthermore, with the recognition processing, it is also possible to suppress erroneous recognition or unrecognition of a spatial cycle pattern (fence, mesh-like structure, or the like).
Note that, in the above description, for example, in each of the original images 320Rn1, 320Rn2, ..., and 320Rnn, the sampling pixels are selected according to the pixel positions arbitrarily set for the entire image, but this is not limited to this example. For example, a sampling pixel may be selected according to a pixel position arbitrarily set in the divided region 35 obtained by dividing the original image 320.
Next, a fifth modified example of the first embodiment will be described. The fifth modified example of the first embodiment is an example in which a configuration of a pixel position of a sampling pixel for performing the recognition processing is changed according to a recognition result.
The recognition section 220 sets a region of interest for the captured image on the basis of the recognition results for the original images 320Φ1, 320Φ2, 320Φ3, and 320Φ4. As an example, in a case where a recognition result in which a target object is recognized with low reliability is obtained, the recognition section 220 sets a region of a predetermined range including the target object in the captured image as the region of interest. The preprocessing section 210 sets a pixel position of a sampling pixel in the region of interest. In the example of
At this time, the preprocessing section 210 can set all the pixel positions in the region of interest as the pixel positions of the sampling pixels without performing thinning. Alternatively, the preprocessing section 210 may set, for the region of interest, the pixel positions of the sampling pixels at a sampling interval smaller than a sampling interval of the sampling pixels set in the original images 320Φ1 to 320Φ4. Furthermore, the preprocessing section 210 may set, for the region of interest, the pixel positions of the sampling pixels at a sampling interval equivalent to the sampling interval of the sampling pixels set in the original images 320Φ1 to 320Φ4.
As described above, in the fifth modified example of the first embodiment, the pixel positions of the sampling pixels are set for the region of interest set for the captured image on the basis of the recognition results for the original images 320Φ1 to 320Φ4. Therefore, the load of the recognition section 220 is reduced, and power consumption can be suppressed. Furthermore, by setting the pixel positions of the sampling pixels at a smaller sampling interval for the region of interest, a more accurate recognition result can be acquired at a higher speed.
Next, a sixth modified example of the first embodiment will be described. The sixth modified example of the first embodiment is an example in which the exposure performed by the imaging section 1200 is controlled for each phase of subsampling of one cycle.
Here, the preprocessing section 210 sequentially sets, for each of the original images 320ExpΦ1, 320ExpΦ2, 320ExpΦ3, and 320ExpΦ4, an exposure time shorter than that for the immediately previous original image. As described above, in the subsampling for one cycle, a dynamic range for luminance can be widened by setting different exposure times for the original images 320ExpΦ1, 320ExpΦ2, 320ExpΦ3, and 320ExpΦ4.
Next, a seventh modified example of the first embodiment will be described. The seventh modified example of the first embodiment is an example in which an analog gain for a pixel signal by the imaging section 1200 is controlled for each phase of subsampling of one cycle. For example, the preprocessing section 210 sets, in the imaging section 1200, an analog gain different for each phase of subsampling when reading, from the pixel array section 1001, pixel signals of the respective original images 320 to be subjected to subsampling with different phases.
In the imaging section 1200, the control section 1100 passes information indicating the set analog gain to the AD conversion section 1003. The AD conversion section 1003 controls a gain of a pixel signal input to each AD converter 1007 included in the AD conversion section 1003 via the vertical signal line VSL according to the information indicating the analog gain.
As described above, in the subsampling for one cycle, the dynamic range for luminance can be widened by setting different analog gains for the original images 320 to be subjected to subsampling with different phases.
Note that the sixth modified example of the first embodiment described above and the seventh modified example of the first embodiment are different in whether the luminance of the original image 320 is controlled by using the exposure time or the analog gain. Here, in a case where the luminance is controlled by using the exposure time, when the exposure time is increased, the original image 320 with a high luminance can be acquired, and noise of the acquired original image 320 can be suppressed. On the other hand, the increase in exposure time causes an increase in blurred portion in the original image 320.
On the other hand, in a case where the luminance is controlled by using the analog gain, the blurred portion in the original image 320 does not change even when a bright original image 320 is acquired by increasing the analog gain. Meanwhile, when the analog gain is increased, the high-luminance original image 320 can be acquired, but the noise increases.
Therefore, it is preferable that the sixth modified example of the first embodiment and the seventh modified example of the first embodiment are used depending on the purpose. For example, in a case where the recognition processing for a dynamic scene is performed, the high-luminance original image 320 is obtained by increasing the analog gain, thereby suppressing blurring. On the other hand, in a case where the recognition processing for a static scene is performed, the high-luminance original image 320 is obtained by increasing the exposure time, thereby suppressing generation of noise.
Next, a second embodiment of the present disclosure will be described. The second embodiment of the present disclosure is an example in which the sensor section 10b including the pixel array section 1001, the recognition section 220, and a component corresponding to the preprocessing section 210 are integrally incorporated in a CIS having a multilayer structure.
Note that, in
The reading control section 230 supplies a control signal that specifies a pixel circuit 1000 from which a pixel signal is to be read to the pixel array section 1001. For example, the reading control section 230 can specify a line from which a pixel signal is to be read in the pixel array section 1001. Alternatively, the reading control section 230 can also specify a pixel circuit 1000 from which a pixel signal is to be read in the pixel array section 1001 in units of the pixel circuits 1000. At this time, the reading control section 230 can specify a pixel circuit 1000 corresponding to a pixel position of a sampling pixel by the phase shift subsampling described in the first embodiment in the pixel array section 1001.
The pixel array section 1001 converts the pixel signal read from the specified pixel circuit 1000 into digital pixel data, and passes the pixel data to the reading control section 230. The reading control section 230 passes the pixel data for one frame passed from the pixel array section 1001 to the recognition section 220 as image data. The image data is a sampling image obtained by the phase shift subsampling. The recognition section 220 performs the recognition processing on the passed image data.
In the second embodiment, the information processing device 1c can include a multilayer CIS having a two-layer structure in which semiconductor chips are stacked in two layers, which has been described with reference to
As another example, the information processing device 1c can include a multilayer CIS having a three-layer structure in which semiconductor chips are stacked in three layers described with reference to
Next, a more specific configuration example for implementing the recognition processing according to the second embodiment will be described.
Furthermore, in the following description, as described with reference to Section (b) of
In the second embodiment, the reading control section 230 selectively reads a line including sampling pixels in imaging processing of each of the frames #1 to #4. For example, in the frame #1, sampling pixels are selected with the upper-left pixel of the divided region 35 as a base point, and in the frame #2, sampling pixels are selected with a pixel adjacent to the upper-left pixel of the divided region 35 as a base point. In other words, in a case where a line at the upper end of the frame is the first line, odd-numbered lines are selectively read in the frames #1 and #2, and even-numbered lines are selectively read in the frames #3 and #4.
For example, in the frame #1, the reading control section 230 selects pixel data of sampling pixels from pixel data of each read line, and generates a sampling image 36Φ1 from the selected pixel data (Step S10a). The reading control section 230 passes the generated sampling image 36Φ1 to the recognition section 220. The recognition section 220 performs the recognition processing on the basis of the sampling image 36Φ1 passed from the reading control section 230 (Step S11, Step S12, and Step S13), and outputs a recognition result Φ1.
Thereafter, for the frames #2, #3, and #4, similarly, the reading control section 230 generates sampling images 36Φ2, 36Φ3, and 36Φ4 by using sampling pixels selected from pixel data of each read line (Step 10b, Step 10c, and Step 10d). The recognition section 220 performs the recognition processing on the basis of the sampling images 36Φ2, 36Φ3, and 36Φ4 generated by the reading control section 230 (Step S11, Step S12, and Step S13), and outputs a recognition result Φ2, a recognition result Φ3, and a recognition result Φ4.
In addition, similarly to the first embodiment, the recognition section 220 sequentially integrates feature amounts extracted from the sampling images 36Φ1 to 36Φ4 by the recognition processing. The recognition section 220 outputs the recognition results Φ1 to Φ4 based on the sampling images 36Φ1 to 36Φ4 by using the integrated feature amount. That is, the information amount of each of the recognition results Φ1 to Φ4 obtained by the recognition processing based on each of the frames #1 to #4 increases every time the recognition processing is repeated, which indicates that the obtained recognition result becomes more detailed every time the recognition processing is performed.
As described above, in the second embodiment, the subsampling processing is performed in the sensor section 10c. Therefore, it is not necessary to perform reading from all the pixel circuits 1000 included in the pixel array section 1001. Therefore, it is possible to further shorten the latency of the recognition processing as compared with the first embodiment described above. In addition, since pixel circuits 1000 of a line including the sampling pixels are selectively read from all the pixel circuits 1000, the amount of pixel signals read from the pixel array section 1001 can be reduced, and the bus width can be reduced.
Furthermore, in the second embodiment, each pixel circuit 1000 is read by line performing thinning in the pixel array section 1001. Therefore, distortion of a captured image due to rolling shutter can be reduced. Furthermore, it is possible to reduce power consumption at the time of imaging in the pixel array section 1001. Furthermore, for a line thinned out by the subsampling, for example, it is also possible to perform imaging by changing an imaging condition such as exposure for the line to be read by the subsampling.
Next, an application example of the recognition processing of the second embodiment will be described.
A first application example is application to recognition processing for a high-resolution captured image such as a 4K resolution image.
In the second embodiment, since at least a part of the subsampling processing is performed inside the sensor section 10c when reading a pixel signal from the pixel array section 1001, the amount of data handled in one frame is small. Furthermore, the recognition section 220 performs the recognition processing for each frame on the basis of each sampling image by the subsampling processing. Therefore, it is possible to obtain a recognition result with high simultaneity with respect to the captured image. Furthermore, since the recognition section 220 sequentially integrates the feature amounts between frames, it is possible to obtain a more accurate recognition result by effectively utilizing the 4K resolution image.
A second application example is application to recognition processing for a sampling image whose resolution is reduced by the subsampling. Here, as the second application example, a user interface (UI) and improvement of user experience (UX) by the UI will be described.
According to the existing technology, as described with reference to
Note that the second application example can be applied not only to the second embodiment but also to the above-described first embodiment and each modified example thereof.
A third application example is an application example for data transfer and reduction of power consumption.
Note that, here, in the information processing device 1c according to the second embodiment, as described with reference to Section (b) of
In the example of the information processing device 1c′ depicted in Section (a) of
On the other hand, in the information processing device 1c according to the second embodiment depicted in Section (b) of
As described above, in the information processing device 1c according to the second embodiment, the amount of pixel data transferred from the sensor section 10c to the recognition section 220 can be reduced, the bus width can be reduced, the processing amount of the recognition section 220 per frame is reduced, and lower power consumption can be achieved, as compared with the information processing device 1c′ that uses a captured image for the recognition processing without reducing the size of the captured image.
On the other hand, in a case where power equivalent to that of the information processing device 1c′ that uses a captured image without reducing the size of the captured image as depicted in Section (a) of
Next, a third embodiment of the present disclosure will be described. The third embodiment is an example in which the sensor section 10c and the recognition section 220 are separated in the information processing device 1c according to the second embodiment described above.
Here, the sensor section 10d is formed by, for example, a multilayer CIS having a two-layer structure in which semiconductor chips are stacked in two layers, which has been described with reference to
The sensor section 10d outputs image data of a sampling image from the reading control section 230, and supplies the image data to the recognition processing section 20d included in hardware different from the sensor section 10d. The recognition processing section 20d inputs the image data supplied from the sensor section 10d to the recognition section 220. The recognition section 220 performs the recognition processing on the basis of the input image data, and outputs a recognition result to the outside.
As another example, the sensor section 10d can be formed by a multilayer CIS having a three-layer structure in which semiconductor chips are stacked in three layers described with reference to
In this manner, as the recognition processing section 20d (recognition section 220) is implemented by hardware separated from the sensor section 10d, the configuration of the recognition section 220, for example, the recognition model and the like, can be easily changed.
Furthermore, since the recognition processing is performed on the basis of a sampling image obtained by the subsampling in the sensor section 10d, the load of the recognition processing can be reduced as compared with a case where the recognition processing is performed using the image data 32 of the captured image as it is. Therefore, for example, a CPU, a DSP, or a GPU having low processing capability can be used in the recognition processing section 20d, and the cost of the information processing device 1d can be reduced.
Next, a fourth embodiment of the present disclosure will be described. In the first to third embodiments described above, one subsampling is performed for one piece of image data 32, but this is not limited to this example. The fourth embodiment is an example in which the subsampling is performed a plurality of times for one piece of image data 32.
Note that all of the information processing device 1b according to the first embodiment and each modified example thereof described above, the information processing device 1c according to the second embodiment, and the information processing device 1d according to the third embodiment are applicable in the third embodiment. Hereinafter, a description will be given on the assumption that the information processing device 1b depicted in
Furthermore, sampling of the pixels 300 is performed in the order of coordinates [1,1], [1,0], [0,1], and [0,0] with the lower-right pixel position [1,1] as a base point in each divided region 35. In addition, sampling images including sampling pixels acquired by the subsampling with the respective coordinates [1,1], [1,0], [0,1], and [0,0] as base points are referred to as a sampling image of a phase [1,1], a sampling image of a phase [1,0], a sampling image of a phase [0,1], a sampling image of a phase [0,0], and the like, respectively.
In Section (a) of
The preprocessing section 210 performs the subsampling on one piece of image data 32a while periodically shifting the position (Steps S10a to S10d). For example, during one frame period from acquisition of the image data 32a to acquisition of the next image data, the preprocessing section 210 sequentially performs the subsampling with the coordinates [1,1], [1,0], [0,1], and [0,0] as base points, and acquires the sampling image of the phase [1,1], the sampling image of the phase [1,0], the sampling image of the phase [0,1], and the sampling image of the phase [0,0].
The recognition section 220 performs feature amount extraction on each of the sampling image of the phase [1,1], the sampling image of the phase [1,0], the sampling image of the phase [0,1], and the sampling image of the phase [0,0] (Step S11), to extract feature amounts 50a, 50b, 50c, and 50d.
The recognition section 220 performs processing of integrating the feature amounts 50a, 50b, 50c, and 50d extracted from the image data 32a (Step S12), and performs the recognition processing on the basis of a feature amount obtained by integrating the feature amounts 50a, 50b, 50c, and 50d (Step S13) .
As described above, the feature amount extraction processing is performed on a sampling image of each phase obtained by performing the subsampling while shifting the phase for one piece of image data 32a, as a result of which the recognition processing based on the integrated feature amount can be performed at a higher speed.
Here, in the above description, the subsampling for all the phases [1,1], [1,0], [0,1], and [0,0] is performed in the divided region 35 including 2 pixels × 2 pixels, the recognition processing for the entire one piece of image data 32a is performed. This is not limited to this example, and the subsampling may be selectively performed for a specific phase among the phases [1,1], [1,0], [0,1], and [0,0].
For example, to the subsampling is performed only on the phases [1,1] and [0,0] positioned diagonally among the respective phases [1,1], [1,0], [0,1], and [0,0], and the feature amounts of the acquired sampling images of the phases [1,1] and [0,0] are extracted to perform the recognition processing. As a result, the processing amounts of the feature extraction and the human initial processing can be reduced, and power consumption in the recognition processing section 20b can be suppressed.
Furthermore, for example, a result of performing the recognition processing on the basis of the feature amount obtained by performing the subsampling on one (for example, the phase [1,1]) of the phases [1,1], [1,0], [0,1], and [0,0] can be output as a promptly reported result. In this case, after the promptly reported result is output, the subsampling for other phases (for example, [1,0], [0,1] and [0,0]) is performed, the recognition processing is performed on the basis of a feature amount obtained by integrating the feature amounts of [1,1], [1,0], [0,1], and [0,0], and a recognition result is output.
Further, in this case, the subsequent processing (subsampling with other phases, feature extraction, and the like) can be omitted as long as a sufficient recognition result can be obtained from the promptly reported result. In this case, processing for the next image data can be started immediately after the output of the promptly reported result, and the frame rate can be further increased.
Next, application examples of the information processing devices 1b, 1c, and 1d according to the first embodiment and each modified example thereof, the second embodiment, the third embodiment, and the fourth embodiment according to the present disclosure will be described as a fifth embodiment.
For example, the information processing device 1a described above can be used in various cases where light such as visible light, infrared light, ultraviolet light, and X-rays is sensed and the recognition processing is performed on the basis of a sensing result as follows.
The technology (present technology) according to the present disclosure can be applied to various products. For example, the technology according to an embodiment of the present disclosure may be implemented as a device mounted in any one of mobile bodies such as a vehicle, an electric vehicle, a hybrid electric vehicle, a motorcycle, a bicycle, a personal mobility device, a plane, a drone, a ship, a robot, and the like.
The vehicle control system 12000 includes a plurality of electronic control units connected to each other via a communication network 12001. In the example depicted in
The driving system control unit 12010 controls the operation of devices related to the driving system of the vehicle in accordance with various kinds of programs. For example, the driving system control unit 12010 functions as a control device for a driving force generating device for generating the driving force of the vehicle, such as an internal combustion engine, a driving motor, or the like, a driving force transmitting mechanism for transmitting the driving force to wheels, a steering mechanism for adjusting the steering angle of the vehicle, a braking device for generating the braking force of the vehicle, and the like.
The body system control unit 12020 controls the operation of various kinds of devices provided to a vehicle body in accordance with various kinds of programs. For example, the body system control unit 12020 functions as a control device for a keyless entry system, a smart key system, a power window device, or various kinds of lamps such as a headlamp, a backup lamp, a brake lamp, a turn signal, a fog lamp, or the like. In this case, radio waves transmitted from a mobile device as an alternative to a key or signals of various kinds of switches can be input to the body system control unit 12020. The body system control unit 12020 receives these input radio waves or signals, and controls a door lock device, the power window device, the lamps, or the like of the vehicle.
The outside-vehicle information detecting unit 12030 detects information about the outside of the vehicle including the vehicle control system 12000. For example, the outside-vehicle information detecting unit 12030 is connected with an imaging section 12031. The outside-vehicle information detecting unit 12030 makes the imaging section 12031 image an image of the outside of the vehicle, and receives the imaged image. On the basis of the received image, the outside-vehicle information detecting unit 12030 may perform processing of detecting an object such as a human, a vehicle, an obstacle, a sign, a character on a road surface, or the like, or processing of detecting a distance thereto.
The imaging section 12031 is an optical sensor that receives light, and which outputs an electric signal corresponding to a received light amount of the light. The imaging section 12031 can output the electric signal as an image, or can output the electric signal as information about a measured distance. In addition, the light received by the imaging section 12031 may be visible light, or may be invisible light such as infrared rays or the like.
The in-vehicle information detecting unit 12040 detects information about the inside of the vehicle. The in-vehicle information detecting unit 12040 is, for example, connected with a driver state detecting section 12041 that detects the state of a driver. The driver state detecting section 12041, for example, includes a camera that images the driver. On the basis of detection information input from the driver state detecting section 12041, the in-vehicle information detecting unit 12040 may calculate a degree of fatigue of the driver or a degree of concentration of the driver, or may determine whether the driver is dozing.
The microcomputer 12051 can calculate a control target value for the driving force generating device, the steering mechanism, or the braking device on the basis of the information about the inside or outside of the vehicle which information is obtained by the outside-vehicle information detecting unit 12030 or the in-vehicle information detecting unit 12040, and output a control command to the driving system control unit 12010. For example, the microcomputer 12051 can perform cooperative control intended to implement functions of an advanced driver assistance system (ADAS) which functions include collision avoidance or shock mitigation for the vehicle, following driving based on a following distance, vehicle speed maintaining driving, a warning of collision of the vehicle, a warning of deviation of the vehicle from a lane, or the like.
In addition, the microcomputer 12051 can perform cooperative control intended for automated driving, which makes the vehicle to travel automatedly without depending on the operation of the driver, or the like, by controlling the driving force generating device, the steering mechanism, the braking device, or the like on the basis of the information about the outside or inside of the vehicle which information is obtained by the outside-vehicle information detecting unit 12030 or the in-vehicle information detecting unit 12040.
In addition, the microcomputer 12051 can output a control command to the body system control unit 12020 on the basis of the information about the outside of the vehicle which information is obtained by the outside-vehicle information detecting unit 12030. For example, the microcomputer 12051 can perform cooperative control intended to prevent a glare by controlling the headlamp so as to change from a high beam to a low beam, for example, in accordance with the position of a preceding vehicle or an oncoming vehicle detected by the outside-vehicle information detecting unit 12030.
The sound/image output section 12052 transmits an output signal of at least one of a sound and an image to an output device capable of visually or auditorily notifying information to an occupant of the vehicle or the outside of the vehicle. In the example of
In
The imaging sections 12101, 12102, 12103, 12104, and 12105 are, for example, disposed at positions on a front nose, sideview mirrors, a rear bumper, a back door of the vehicle 12100 as well as a position on an upper portion of a windshield within the interior of the vehicle, or the like. The imaging section 12101 provided to the front nose and the imaging section 12105 provided to the upper portion of the windshield within the interior of the vehicle obtain mainly an image of the front of the vehicle 12100. The imaging sections 12102 and 12103 provided to the sideview mirrors obtain mainly an image of the sides of the vehicle 12100. The imaging section 12104 provided to the rear bumper or the back door obtains mainly an image of the rear of the vehicle 12100. The image of the front of the vehicle 12100 acquired by the imaging sections 12101 and 12105 is used mainly to detect a preceding vehicle, a pedestrian, an obstacle, a signal, a traffic sign, a lane, or the like.
Incidentally,
At least one of the imaging sections 12101 to 12104 may have a function of obtaining distance information. For example, at least one of the imaging sections 12101 to 12104 may be a stereo camera constituted of a plurality of imaging elements, or may be an imaging element having pixels for phase difference detection.
For example, the microcomputer 12051 can determine a distance to each three-dimensional object within the imaging ranges 12111 to 12114 and a temporal change in the distance (relative speed with respect to the vehicle 12100) on the basis of the distance information obtained from the imaging sections 12101 to 12104, and thereby extract, as a preceding vehicle, a nearest three-dimensional object in particular that is present on a traveling path of the vehicle 12100 and which travels in substantially the same direction as the vehicle 12100 at a predetermined speed (for example, equal to or more than 0 km/hour). Further, the microcomputer 12051 can set a following distance to be maintained in front of a preceding vehicle in advance, and perform automatic brake control (including following stop control), automatic acceleration control (including following start control), or the like. It is thus possible to perform cooperative control intended for automated driving that makes the vehicle travel automatedly without depending on the operation of the driver or the like.
For example, the microcomputer 12051 can classify three-dimensional object data on three-dimensional objects into three-dimensional object data of a two-wheeled vehicle, a standard-sized vehicle, a large-sized vehicle, a pedestrian, a utility pole, and other three-dimensional objects on the basis of the distance information obtained from the imaging sections 12101 to 12104, extract the classified three-dimensional object data, and use the extracted three-dimensional object data for automatic avoidance of an obstacle. For example, the microcomputer 12051 identifies obstacles around the vehicle 12100 as obstacles that the driver of the vehicle 12100 can recognize visually and obstacles that are difficult for the driver of the vehicle 12100 to recognize visually. Then, the microcomputer 12051 determines a collision risk indicating a risk of collision with each obstacle. In a situation in which the collision risk is equal to or higher than a set value and there is thus a possibility of collision, the microcomputer 12051 outputs a warning to the driver via the audio speaker 12061 or the display section 12062, and performs forced deceleration or avoidance steering via the driving system control unit 12010. The microcomputer 12051 can thereby assist in driving to avoid collision.
At least one of the imaging sections 12101 to 12104 may be an infrared camera that detects infrared rays. The microcomputer 12051 can, for example, recognize a pedestrian by determining whether or not there is a pedestrian in imaged images of the imaging sections 12101 to 12104. Such recognition of a pedestrian is, for example, performed by a procedure of extracting characteristic points in the imaged images of the imaging sections 12101 to 12104 as infrared cameras and a procedure of determining whether or not it is the pedestrian by performing pattern matching processing on a series of characteristic points representing the contour of the object. When the microcomputer 12051 determines that there is a pedestrian in the imaged images of the imaging sections 12101 to 12104, and thus recognizes the pedestrian, the sound/image output section 12052 controls the display section 12062 so that a square contour line for emphasis is displayed so as to be superimposed on the recognized pedestrian. Further, the sound/image output section 12052 may also control the display section 12062 so that an icon or the like representing the pedestrian is displayed at a desired position.
Hereinabove, an example of the vehicle control system to which the technology according to an embodiment of the present disclosure can be applied has been described. The technology according to an embodiment of the present disclosure can be applied to the imaging section 12031 and the outside-vehicle information detecting unit 12030 in the above-described configuration. Specifically, for example, the sensor section 10b of the information processing device 1b is applied to the imaging section 12031, and the recognition processing section 20b is applied to the outside-vehicle information detecting unit 12030. The recognition result output from the recognition processing section 20b is passed to the integrated control unit 12050 via, for example, the communication network 12001.
As described above, by applying the technology according to an embodiment of the present disclosure to the imaging section 12031 and the outside-vehicle information detecting unit 12030, it is possible to recognize each of a target object positioned at a short distance and a target object positioned at a long distance, and it is possible to perform recognize a target object positioned at a short distance with high simultaneity, so that it is possible to more reliably support driving.
Note that the effects described in the present specification are merely illustrative and not limitative, and the present technology may have other effects.
Note that the present technology can also have the following configuration.
An information processing device including:
The information processing device according to (1), in which
The information processing device according to (1) or (2), in which
The information processing device according to (1) or (2), in which
The information processing device according to (1) or (2), in which
The information processing device according to any one of (1) to (5), in which
The information processing device according to any one of (1) to (6), in which
The information processing device according to any one of (1) to (7), further including an accumulation section that accumulates the feature amount calculated by the calculation section, in which
The information processing device according to (8), in which
The information processing device according to (8), in which
The information processing device according to any one of (8) to (10), in which
The information processing device according to (11), in which
The information processing device according to any one of (8) to (12), in which
The information processing device according to any one of (1) to (13), in which
The information processing device according to any one of (1) to (14), in which
The information processing device according to any one of (1) to (15), in which
An information processing method
An information processing program for causing a computer to perform:
1
a, 1b, 1c, 1d Information processing device
10
a, 10b, 10c, 10d Sensor section
20
a, 20b, 20d Recognition processing section
30
a, 30b Captured image
32, 32a, 32a′, 32b, 32c, 32d Image data
35, 35′ Divided region
36, 36Φ1, 36Φ1′, 36Φ2, 36Φ3, 36Φ4, 36Φx Sampling image
40, 60, 61, 62, 63, 64, 65, 66, 70, 71 Recognition result
41, 42, 43, 44 Object
50
a, 50a′, 50b, 50c, 50d Feature amount
210 Preprocessing section
211 Reading section
212 Use region acquisition section
220 Recognition section
221 Feature amount calculation section
222 Feature amount accumulation control section
223 Feature amount accumulation section
224 Use region determination section
225 Recognition processing execution section
230 Reading control section
300 Pixel
3101, 3102, 3103, 3104, 3105, 3106, 3107, 3108, 3109 Captured image
320, 320ExpΦ1, 320ExpΦ2, 320ExpΦ3, 320ExpΦ4, 320Rn1, 320Rn2,
320Rnn, 320Φ1, 320Φ2, 320Φ3, 320Φ4 Original image
321, 321a, 321b, 321c, 321d Size-reduced image
1000 Pixel circuit
1001 Pixel array section
1200 Imaging section
1202 Memory
1203 DSP
1205 CPU
Number | Date | Country | Kind |
---|---|---|---|
2020-064086 | Mar 2020 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2021/011645 | 3/22/2021 | WO |