The present technology relates to an information processing apparatus, an information processing method, and a program that are applied to recognize an object in a captured image.
There is a technology used to detect a predetermined object region from an image.
Patent Literature 1 indicated below discloses an obstacle detector that detects an obstacle situated in the surroundings of a moving vehicle on the basis of a difference image based on a difference between a reference frame image and a previous frame image from among frame images of the surroundings of the vehicle, the reference frame image being acquired at a reference point in time, the previous frame image being acquired at a point in time prior to the reference point in time.
Patent Literature 2 indicated below discloses an object detector that detects a motion vector of each portion of a target image using the target image and at least one reference image from among a plurality of captured images, calculates a difference image based on a difference between two images from among the plurality of captured images, and detects an object region in which there exists an object, on the basis of the motion vector and the difference image.
Patent Literature 1: Japanese Patent Application Laid-open No. 2018-97777
Patent Literature 2: Japanese Patent Application Laid-open No. 2015-138319
However, in each of the technologies respectively disclosed in Patent Literatures 1 and 2, an object is detected on the basis of a difference between the entireties of images, and this results in an increase in a quantity of computations. Further, it is often the case that processing is performed on an image similar to a previous image, and this results in performing redundant processing.
In view of the circumstances described above, it is an object of the present technology to provide an information processing apparatus, an information processing method, and a program that make it possible to eliminate redundant processing performed with respect to captured images sequentially acquired during movement, and to reduce a quantity of computations.
In order to achieve the object described above, an information processing apparatus according to an embodiment of the present technology includes an input device and a controller. A captured image that is captured by a camera is input to the input device, the captured image including distance information for each pixel. The controller generates a transformed captured image obtained by transforming pairs of coordinates for respective pixels of the captured image on the basis of an amount of movement of the camera or a mobile body on which the camera is mounted. Further, the controller associates a pair of coordinates for a pixel of the transformed captured image with a pair of coordinates for a pixel of a post-movement captured image captured at a position of the camera after the movement, and the controller identifies a non-associated pixel that is included in the post-movement captured image and is not associated with the pixel of the transformed captured image.
Accordingly, the information processing apparatus identifies a pixel of the post-movement captured image that is not associated with a pixel of the captured image, and this results in there being no need to perform new processing with respect to an associated pixel. This makes it possible to eliminate redundant processing on captured images sequentially acquired during movement, and to reduce a quantity of computations.
The controller may perform recognition processing of recognizing an attribute of the non-associated pixel in the post-movement captured image, and may project a result of the recognition processing onto an associated pixel in the post-movement captured image, or onto a region including the associated pixel, the recognition processing being performed with respect to a pixel that is included in the captured image and corresponds to the associated pixel or the region including the associated pixel.
Accordingly, with respect to an associated pixel of a captured image after movement, the information processing apparatus can project, onto the captured image after the movement, a result of recognition processing performed with respect to a captured image before the movement. This makes it possible to omit recognition processing performed on the associated pixel, and to reduce a quantity of computations.
The controller may generate a map obtained by associating the pair of coordinates for the pixel of the post-movement captured image with the pair of coordinates for the pixel of the captured image in order to perform the projection.
Accordingly, the information processing apparatus can easily project, onto a captured image after movement, a result of recognition performed with respect to a captured image before the movement, by using the generated map.
The controller may transform the captured image into three-dimensional point cloud data based on the distance information for each pixel, may generate movement point-cloud data obtained by performing transformation with respect to the three-dimensional point cloud data on the basis of the amount of the movement, and may project the movement point-cloud data onto an image plane to generate the transformed captured image.
Accordingly, the information processing apparatus transforms a captured image into three-dimensional point cloud data on the basis of distance information, and transforms the three-dimensional point cloud data into a plane image after movement. Consequently, the information processing apparatus can accurately identify a corresponding pixel.
The controller may set a priority of performing the recognition processing according to a position of the non-associated pixel in the post-movement captured image.
Accordingly, the information processing apparatus sets the frequency of performing the recognition processing according to the position of a region, such as setting the frequency of performing the recognition processing higher for a region of a center portion in a captured image than for a region of an end portion in the captured image. This makes it possible to reduce a quantity of computations.
The controller may set the priority of performing the recognition processing for each non-associated pixel according to the position of the non-associated pixel in the post-movement captured image, and according to a movement speed of the mobile body.
Accordingly, the information processing apparatus can cope with a change in important region due to a change in movement speed, such as setting, during high-speed movement, the frequency of performing the recognition processing higher for a region of a center portion in an image than for a region of an end portion in the image, and setting, during low-speed movement, the frequency of performing the recognition processing lower for the region of the center portion in the image than for the region of the end portion in the image.
The controller may set a priority of performing the recognition processing for each non-associated pixel according to the distance information of the non-associated pixel
Accordingly, the information processing apparatus sets the frequency of performing the recognition processing according to the distance, such as setting the frequency of performing the recognition processing higher for a region close to a camera than for a region situated away from the camera. This makes it possible to reduce a quantity of computations.
An image processing method according to another embodiment of the present technology includes
acquiring a captured image captured by a camera, the captured image including distance information for each pixel;
generating a transformed captured image obtained by transforming pairs of coordinates for respective pixels of the captured image on the basis of an amount of movement of the camera or a mobile body on which the camera is mounted;
associating a pair of coordinates for a pixel of the transformed captured image with a pair of coordinates for a pixel of a post-movement captured image captured at a position of the camera after the movement; and
identifying a non-associated pixel that is included in the post-movement captured image and is not associated with the pixel of the transformed captured image.
A program according to another embodiment of the present technology causes an information processing apparatus to perform a process including
acquiring a captured image captured by a camera, the captured image including distance information for each pixel;
generating a transformed captured image obtained by transforming pairs of coordinates for respective pixels of the captured image on the basis of an amount of movement of the camera or a mobile body on which the camera is mounted;
associating a pair of coordinates for a pixel of the transformed captured image with a pair of coordinates for a pixel of a post-movement captured image captured at a position of the camera after the movement; and
identifying a non-associated pixel that is included in the post-movement captured image and is not associated with the pixel of the transformed captured image.
As described above, the present technology makes it possible to eliminate redundant processing performed with respect to captured images sequentially acquired during movement, and to reduce a quantity of computations. However, the present technology is not limited to this effect.
Embodiments of the present technology will now be described below with reference to the drawings.
[Configuration of Vehicle Control System]
Each of the control units includes: a microcomputer that performs arithmetic processing according to various kinds of programs; a storage section that stores the programs executed by the microcomputer, parameters used for various kinds of operations, or the like; and a driving circuit that drives various kinds of control target devices. Each of the control units further includes: a network interface (I/F) for performing communication with other control units via the communication network 7010; and a communication I/F for performing communication with a device, a sensor, or the like within and without the vehicle by wire communication or radio communication. A functional configuration of the integrated control unit 7600 illustrated in
The driving system control unit 7100 controls the operation of devices related to the driving system of the vehicle in accordance with various kinds of programs. For example, the driving system control unit 7100 functions as a control device for a driving force generating device for generating the driving force of the vehicle, such as an internal combustion engine, a driving motor, or the like, a driving force transmitting mechanism for transmitting the driving force to wheels, a steering mechanism for adjusting the steering angle of the vehicle, a braking device for generating the braking force of the vehicle, and the like. The driving system control unit 7100 may have a function as a control device of an antilock brake system (ABS), electronic stability control (ESC), or the like.
The driving system control unit 7100 is connected with a vehicle state detecting section 7110. The vehicle state detecting section 7110, for example, includes at least one of a gyro sensor that detects the angular velocity of axial rotational movement of a vehicle body, an acceleration sensor that detects the acceleration of the vehicle, and sensors for detecting an amount of operation of an accelerator pedal, an amount of operation of a brake pedal, the steering angle of a steering wheel, an engine speed or the rotational speed of wheels, and the like. The driving system control unit 7100 performs arithmetic processing using a signal input from the vehicle state detecting section 7110, and controls the internal combustion engine, the driving motor, an electric power steering device, the brake device, and the like.
The body system control unit 7200 controls the operation of various kinds of devices provided to the vehicle body in accordance with various kinds of programs. For example, the body system control unit 7200 functions as a control device for a keyless entry system, a smart key system, a power window device, or various kinds of lamps such as a headlamp, a backup lamp, a brake lamp, a turn signal, a fog lamp, or the like. In this case, radio waves transmitted from a mobile device as an alternative to a key or signals of various kinds of switches can be input to the body system control unit 7200. The body system control unit 7200 receives these input radio waves or signals, and controls a door lock device, the power window device, the lamps, or the like of the vehicle.
The battery control unit 7300 controls a secondary battery 7310, which is a power supply source for the driving motor, in accordance with various kinds of programs. For example, the battery control unit 7300 is supplied with information about a battery temperature, a battery output voltage, an amount of charge remaining in the battery, or the like from a battery device including the secondary battery 7310. The battery control unit 7300 performs arithmetic processing using these signals, and performs control for regulating the temperature of the secondary battery 7310 or controls a cooling device provided to the battery device or the like.
The outside-vehicle information detecting unit 7400 detects information about the outside of the vehicle including the vehicle control system 7000. For example, the outside-vehicle information detecting unit 7400 is connected with at least one of an imaging section 7410 and an outside-vehicle information detecting section 7420. The imaging section 7410 includes at least one of a time-of-flight (ToF) camera, a stereo camera, a monocular camera, an infrared camera, and other cameras. The outside-vehicle information detecting section 7420, for example, includes at least one of an environmental sensor for detecting current atmospheric conditions or weather conditions and a peripheral information detecting sensor for detecting another vehicle, an obstacle, a pedestrian, or the like on the periphery of the vehicle including the vehicle control system 7000.
The environmental sensor, for example, may be at least one of a rain drop sensor detecting rain, a fog sensor detecting a fog, a sunshine sensor detecting a degree of sunshine, and a snow sensor detecting a snowfall. The peripheral information detecting sensor may be at least one of an ultrasonic sensor, a radar device, and a LIDAR device (Light detection and Ranging device, or Laser imaging detection and ranging device). Each of the imaging section 7410 and the outside-vehicle information detecting section 7420 may be provided as an independent sensor or device, or may be provided as a device in which a plurality of sensors or devices are integrated.
Incidentally,
Outside-vehicle information detecting sections 7920, 7922, 7924, 7926, 7928, and 7930 provided to the front, rear, sides, and corners of the vehicle 7900 and the upper portion of the windshield within the interior of the vehicle may be, for example, an ultrasonic sensor or a radar device. The outside-vehicle information detecting sections 7920, 7926, and 7930 provided to the front nose of the vehicle 7900, the rear bumper, the back door of the vehicle 7900, and the upper portion of the windshield within the interior of the vehicle may be a LIDAR device, for example. These outside-vehicle information detecting sections 7920 to 7930 are used mainly to detect a preceding vehicle, a pedestrian, an obstacle, or the like.
Returning to
In addition, on the basis of the received image data, the outside-vehicle information detecting unit 7400 may perform image recognition processing of recognizing a human, a vehicle, an obstacle, a sign, a character on a road surface, or the like, or processing of detecting a distance thereto. The outside-vehicle information detecting unit 7400 may subject the received image data to processing such as distortion correction, alignment, or the like, and combine the image data imaged by a plurality of different imaging sections 7410 to generate a bird's-eye image or a panoramic image. The outside-vehicle information detecting unit 7400 may perform viewpoint conversion processing using the image data imaged by the imaging section 7410 including the different imaging parts.
The in-vehicle information detecting unit 7500 detects information about the inside of the vehicle. The in-vehicle information detecting unit 7500 is, for example, connected with a driver state detecting section 7510 that detects the state of a driver. The driver state detecting section 7510 may include a camera that images the driver, a biosensor that detects biological information of the driver, a microphone that collects sound within the interior of the vehicle, or the like. The biosensor is, for example, disposed in a seat surface, the steering wheel, or the like, and detects biological information of an occupant sitting in a seat or the driver holding the steering wheel. On the basis of detection information input from the driver state detecting section 7510, the in-vehicle information detecting unit 7500 may calculate a degree of fatigue of the driver or a degree of concentration of the driver, or may determine whether the driver is dozing. The in-vehicle information detecting unit 7500 may subject an audio signal obtained by the collection of the sound to processing such as noise canceling processing or the like.
The integrated control unit 7600 controls general operation within the vehicle control system 7000 in accordance with various kinds of programs. The integrated control unit 7600 is connected with an input section 7800. The input section 7800 is implemented by a device capable of input operation by an occupant, such, for example, as a touch panel, a button, a microphone, a switch, a lever, or the like. The integrated control unit 7600 may be supplied with data obtained by voice recognition of voice input through the microphone. The input section 7800 may, for example, be a remote control device using infrared rays or other radio waves, or an external connecting device such as a mobile telephone, a personal digital assistant (PDA), or the like that supports operation of the vehicle control system 7000. The input section 7800 may be, for example, a camera. In that case, an occupant can input information by gesture. Alternatively, data may be input which is obtained by detecting the movement of a wearable device that an occupant wears. Further, the input section 7800 may, for example, include an input control circuit or the like that generates an input signal on the basis of information input by an occupant or the like using the above-described input section 7800, and which outputs the generated input signal to the integrated control unit 7600. An occupant or the like inputs various kinds of data or gives an instruction for processing operation to the vehicle control system 7000 by operating the input section 7800.
The storage section 7690 may include a read only memory (ROM) that stores various kinds of programs executed by the microcomputer and a random access memory (RAM) that stores various kinds of parameters, operation results, sensor values, or the like. In addition, the storage section 7690 may be implemented by a magnetic storage device such as a hard disc drive (HDD) or the like, a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like.
The general-purpose communication I/F 7620 is a communication I/F used widely, which communication I/F mediates communication with various apparatuses present in an external environment 7750. The general-purpose communication I/F 7620 may implement a cellular communication protocol such as global system for mobile communications (GSM (registered trademark)), worldwide interoperability for microwave access (WiMAX (registered trademark)), long term evolution (LTE (registered trademark)), LTE-advanced (LTE-A), or the like, or another wireless communication protocol such as wireless LAN (referred to also as wireless fidelity (Wi-Fi (registered trademark)), Bluetooth (registered trademark), or the like. The general-purpose communication I/F 7620 may, for example, connect to an apparatus (for example, an application server or a control server) present on an external network (for example, the Internet, a cloud network, or a company-specific network) via a base station or an access point. In addition, the general-purpose communication I/F 7620 may connect to a terminal present in the vicinity of the vehicle (which terminal is, for example, a terminal of the driver, a pedestrian, or a store, or a machine type communication (MTC) terminal) using a peer to peer (P2P) technology, for example.
The dedicated communication I/F 7630 is a communication I/F that supports a communication protocol developed for use in vehicles. The dedicated communication I/F 7630 may implement a standard protocol such, for example, as wireless access in vehicle environment (WAVE), which is a combination of institute of electrical and electronic engineers (IEEE) 802.11p as a lower layer and IEEE 1609 as a higher layer, dedicated short range communications (DSRC), or a cellular communication protocol. The dedicated communication I/F 7630 typically carries out V2X communication as a concept including one or more of communication between a vehicle and a vehicle (Vehicle to Vehicle), communication between a road and a vehicle (Vehicle to Infrastructure), communication between a vehicle and a home (Vehicle to Home), and communication between a pedestrian and a vehicle (Vehicle to Pedestrian).
The positioning section 7640, for example, performs positioning by receiving a global navigation satellite system (GNSS) signal from a GNSS satellite (for example, a GPS signal from a global positioning system (GPS) satellite), and generates positional information including the latitude, longitude, and altitude of the vehicle. Incidentally, the positioning section 7640 may identify a current position by exchanging signals with a wireless access point, or may obtain the positional information from a terminal such as a mobile telephone, a personal handyphone system (PHS), or a smart phone that has a positioning function.
The beacon receiving section 7650, for example, receives a radio wave or an electromagnetic wave transmitted from a radio station installed on a road or the like, and thereby obtains information about the current position, congestion, a closed road, a necessary time, or the like. Incidentally, the function of the beacon receiving section 7650 may be included in the dedicated communication I/F 7630 described above.
The in-vehicle device I/F 7660 is a communication interface that mediates connection between the microcomputer 7610 and various in-vehicle devices 7760 present within the vehicle. The in-vehicle device I/F 7660 may establish wireless connection using a wireless communication protocol such as wireless LAN, Bluetooth (registered trademark), near field communication (NFC), or wireless universal serial bus (WUSB). In addition, the in-vehicle device I/F 7660 may establish wired connection by universal serial bus (USB), high-definition multimedia interface (HDMI (registered trademark)), mobile high-definition link (MHL), or the like via a connection terminal (and a cable if necessary) not depicted in the figures. The in-vehicle devices 7760 may, for example, include at least one of a mobile device and a wearable device possessed by an occupant and an information device carried into or attached to the vehicle. The in-vehicle devices 7760 may also include a navigation device that searches for a path to an arbitrary destination. The in-vehicle device I/F 7660 exchanges control signals or data signals with these in-vehicle devices 7760.
The vehicle-mounted network I/F 7680 is an interface that mediates communication between the microcomputer 7610 and the communication network 7010. The vehicle-mounted network I/F 7680 transmits and receives signals or the like in conformity with a predetermined protocol supported by the communication network 7010.
The microcomputer 7610 of the integrated control unit 7600 controls the vehicle control system 7000 in accordance with various kinds of programs on the basis of information obtained via at least one of the general-purpose communication I/F 7620, the dedicated communication I/F 7630, the positioning section 7640, the beacon receiving section 7650, the in-vehicle device I/F 7660, and the vehicle-mounted network I/F 7680. For example, the microcomputer 7610 may calculate a control target value for the driving force generating device, the steering mechanism, or the braking device on the basis of the obtained information about the inside and outside of the vehicle, and output a control command to the driving system control unit 7100. For example, the microcomputer 7610 may perform cooperative control intended to implement functions of an advanced driver assistance system (ADAS) which functions include collision avoidance or shock mitigation for the vehicle, following driving based on a following distance, vehicle speed maintaining driving, a warning of collision of the vehicle, a warning of deviation of the vehicle from a lane, or the like. In addition, the microcomputer 7610 may perform cooperative control intended for automatic driving, which makes the vehicle to travel autonomously without depending on the operation of the driver, or the like, by controlling the driving force generating device, the steering mechanism, the braking device, or the like on the basis of the obtained information about the surroundings of the vehicle.
The microcomputer 7610 may generate three-dimensional distance information between the vehicle and an object such as a surrounding structure, a person, or the like, and generate local map information including information about the surroundings of the current position of the vehicle, on the basis of information obtained via at least one of the general-purpose communication I/F 7620, the dedicated communication I/F 7630, the positioning section 7640, the beacon receiving section 7650, the in-vehicle device I/F 7660, and the vehicle-mounted network I/F 7680. In addition, the microcomputer 7610 may predict danger such as collision of the vehicle, approaching of a pedestrian or the like, an entry to a closed road, or the like on the basis of the obtained information, and generate a warning signal. The warning signal may, for example, be a signal for producing a warning sound or lighting a warning lamp.
The sound/image output section 7670 transmits an output signal of at least one of a sound and an image to an output device capable of visually or auditorily notifying information to an occupant of the vehicle or the outside of the vehicle. In the example of
Incidentally, at least two control units connected to each other via the communication network 7010 in the example depicted in
Further, in the present embodiment, the integrated control unit 7600 is capable of performing semantic segmentation used to recognize an attribute such as a road surface, a sidewalk, a pedestrian, and a building for each pixel of an image captured by the imaging section 7410.
[Configurations of Functional Blocks of Vehicle Control System]
In the present embodiment, with respect to captured images sequentially acquired from the imaging section 7410, the integrated control unit 7600 (the microcomputer 7610) is capable of performing semantic segmentation applied to recognize an attribute (such as a road surface, a sidewalk, a pedestrian, and a building) for each pixel of the captured image. The attribute is recognized for each subject region included in a captured image by the semantic segmentation being performed.
On the basis of the attribute, the integrated control unit 7600 can set the frequency of performing the recognition processing (the frequency of update) and a region that is a target for the recognition processing. Note that, in the processing, semantic segmentation is performed with respect to the entirety of the first captured image from among a series of captured images, and the frequency of update is set for each region in subsequent captured images.
As illustrated in
On the basis of positional information regarding a position of a vehicle at a time (T−1) and positional information regarding the position of the vehicle at a time (T) that are generated by the positioning section 7640 (the imaging section 7410), the relative movement estimator 11 generates data (Rt) of an amount of relative movement of the vehicle, and outputs the generated data to the projection map generator 12.
On the basis of data (z) of a distance between the vehicle and a subject at the time (T−1) for each pair of captured-image coordinates, the distance being detected by the outside-vehicle information detecting unit 7400, and on the basis of the relative-movement-amount data (Rt) received from the relative movement estimator 11, the projection map generator 12 generates projection map data, and outputs the generated data to the semantic-segmentation projection section 13 and to the unobserved region setting section 14.
Specifically, with respect to the distance data (z) for each pair of captured-image coordinates, the projection map generator 12 transforms, into three-dimensional point cloud data, a set of all of the pieces of distance data (z) for the respective pairs of captured-image coordinates (depth image data), and performs a coordinate transformation on the point cloud data using the relative-movement-amount data (Rt). Then, the projection map generator 12 generates depth image data obtained by projecting, onto a captured-image plane, the point cloud data obtained after the coordinate transformation. On the basis of the distance data (z) and image coordinates at the time (T−1) in the depth image data, the projection map generator 12 generates projection map data that indicates a position of a projection source and is used to project, onto a captured image at the time (T), a value indicating a result of an image recognition (semantic segmentation) performed with respect to each pixel of a captured image at the time (T−1).
On the basis of the projection map data received from the projection map generator 12 and the semantic segmentation result at the time (T−1), the semantic-segmentation projection section 13 generates projection semantic-segmentation data obtained by projecting the semantic segmentation result onto a captured image at the time (T), and outputs the generated data to the semantic-segmentation integration section 18.
On the basis of the projection map data received from the projection map generator 12, the unobserved region setting section 14 detects a region, in the captured image at the time (T), onto which the semantic segmentation result at the time (T−1) is not projected, that is, an unobserved region in which a position of a projection source in the projection map data is not indicated, and outputs data indicating the unobserved region to the update priority map generator 16.
Regarding a plurality of regions included in a captured image, the region-attribute-relationship determination section 15 determines a relationship between attributes recognized by the semantic segmentation being performed. For example, the region-attribute-relationship determination section 15 determines that there is a pedestrian or a bicycle on a sidewalk or a road surface when a region of a sidewalk or a road surface and a region of a pedestrian or a bicycle overlap.
On the basis of the unobserved region detected by the unobserved region setting section 14 and the relationship between attributes of regions that is determined by the region-attribute-relationship determination section 15, the update priority map generator 16 generates an update priority map in which the priority of update of semantic segmentation (the frequency of update) is set for each region of a captured image.
For example, the update priority map generator 16 gives a high update priority to an unobserved region, gives a low update priority to a region of a pedestrian on a sidewalk, and gives a high update priority to a region of a pedestrian on a road surface.
On the basis of the generated update priority map, the region semantic-segmentation section 17 performs semantic segmentation with respect to each region of the captured image at the time (T), and outputs a result of the semantic segmentation to the semantic-segmentation integration section 18.
The semantic-segmentation integration section 18 integrates the projection semantic-segmentation data at the time (T) that is received from the semantic-segmentation projection section 13 and region semantic-segmentation data at the time (T) that is received from the region semantic-segmentation section 17, and outputs data of a result of semantic segmentation with respect to the entirety of the captured image at the time (T).
The semantic-segmentation result data can be used to perform, for example, a cooperative control intended to implement a function of an ADAS or a cooperative control intended to achieve, for example, automated driving.
These functional blocks (a computer program) may be implemented in the outside-vehicle information detecting unit 7400 instead of the integrated control unit 7600. In this case, the cooperative control for an ADAS or automated driving is performed by the integrated control unit 7600 on the basis of the semantic-segmentation result data output by the outside-vehicle information detecting unit.
[Operation of Vehicle Control System]
Next, an operation of the vehicle control system having the configuration described above is described. This operation is performed by hardware such as the microcomputer 7600, the vehicle-mounted network I/F 7680, and the dedicated communication I/F 7630 of the integrated control unit 7600, and software (the respective functional blocks illustrated in
As illustrated in the figure, first, the relative movement estimator 11 acquires positional information regarding a position of a vehicle at a time (T−1) and positional information regarding the position of the vehicle at a time (T) (Step 101), and estimates a distance of a relative movement of the vehicle (the imaging section) from the time (T−1) to the time (T) (Step 102).
Subsequently, the projection map generator 12 acquires data of a distance between the vehicle and a subject in a captured image at the time (T−1) (Step 103), and generates projection map data on the basis of the distance data and data of the relative-movement distance (Step 104).
Subsequently, on the basis of the projection map data, the unobserved region setting section 14 calculates an unobserved region that is included in a captured image at the time (T) and obtained by comparing the captured image at the time (T) with the captured image at the time (T−1) (Step 105), and generates an update priority map in which a high update priority is given to the unobserved region (Step 106).
Subsequently, on the basis of the projection map data, the semantic-segmentation projection section 13 projects, onto the captured image at the time (T), a semantic segmentation result at the time (T−1) (Step 107).
It is assumed that, with respect to all of the pixels of an input frame (B0) at a time T=0, it has been determined, from the positional information and information regarding the distance, which of the pixels of an input frame at a time T=1 a pixel of the input frame (B0) corresponds to when a vehicle that is traveling at the time T=0 through a point indicated in (A1) of the figure moves at the time T=1 to a point indicated in (A2) of the figure, as illustrated in the figure.
In this case, a result (B1) of semantic segmentation with respect to the input frame at the time T=1 is projected onto an entire region of the input frame at the time T=1, as illustrated in (B2) of the figure. Consequently, redundant processing of semantic segmentation performed with respect to the input frame at the time T=1 is reduced, a quantity of computations is reduced, and the recognition accuracy (stability) is improved.
As described above, depending on the composition of an image captured by a camera, all of a semantic segmentation result can be projected onto a next frame, or an unobserved region onto which a portion of a semantic segmentation result is not projected occurs in a next frame.
Here, processing of the projection map generation and processing of the unobserved region setting are described in detail.
As illustrated in
First, the point-cloud transformation section 121 acquires depth image data D (a captured image including distance information for each pixel) from the outside-vehicle information detecting unit 7400. The depth image data stores therein distance data (z) for each pair of image coordinates (u,v).
Subsequently, the point-cloud transformation section 121 transforms all of the pixels of the depth image D into three-dimensional point cloud data P based on distance information for each pair of coordinates of the pixel ((A) of
Subsequently, with respect to all of the point clouds included in the point cloud data P, the coordinate transformation section 122 performs a coordinate transformation on each point cloud data P on the basis of relative-movement-amount data (Rt) acquired from the relative movement estimator 11, the relative-movement-amount data (Rt) being data of an amount of relative movement of a camera ((B) of
Subsequently, with respect to all of the point clouds included in the point cloud data P′ obtained by the coordinate transformation, the plane projection section 123 projects the point cloud data P′ onto an image plane ((C) of
Next, with respect to all of the pixels of the depth image D′ after the coordinate transformation, the map generator 124 associates a pair of coordinates for a pixel in a frame (after movement) next to a transformation-source frame with a pair of coordinates for a pixel in the transformation-source frame (before the movement) to generate projection map data M ((D) of
The projection map data M stores therein a pair of image coordinates (u,v) of a transformation-source frame for each pair of image coordinates (u,v) of a frame after movement. The projection map data M indicates a correspondence relationship indicating which pair of coordinates in a frame before movement is to be associated with a pair of coordinates in a frame after the movement when a semantic segmentation result of a pair of coordinates in a frame before movement is projected onto a pair of coordinates in a frame after the movement.
As illustrated in
with respect to all of the pairs of coordinates for respective pixels in the projection map data M, the non-associated pixel extracting section 141 performs processing of associating the pair of coordinates with a pair of coordinates for a pixel in a next frame (T) to extract, as an unobserved region R, a non-associated pixel that is included in the next frame (T) and is not associated with a pixel in the projection map data M (or a region including the non-associated pixel) (Step 301).
Consequently, with respect to a pixel that is included in a next frame (T−1) and is associated with a pixel in the projection map data M due to the association processing (or a region including the associated pixel), a semantic segmentation result with respect to an original frame (T−1) is projected onto the associated pixel (or the region including the associated pixel) by the semantic-segmentation projection section 13.
On the other hand, with respect to an unobserved region R that is included in the next frame (T−1) and is not associated with a pixel in the projection map data M due to the association processing, the processing of generating an update priority map is performed, and semantic segmentation processing is newly performed by the region semantic-segmentation section 17 to recognize an attribute of each pixel of the unobserved region R.
Returning to
Subsequently, the update priority map generator 16 generates an update priority map on the basis of the determined relationship between attributes of regions (Step 109).
When a semantic segmentation result at a time (T−1) illustrated in (A) of the figure is projected as a semantic segmentation result at a time (T) illustrated in (B) of the figure, the region-attribute-relationship determination section 15 determines that a region of a pedestrian and a region of a sidewalk overlap on the left in a captured image, and also determines that a region of a pedestrian and a road surface overlap on the right in the captured image.
In this case, a pedestrian and a bicycle on a sidewalk are not expected to be in a very dangerous state. Thus, the update priority map generator 16 gives a low update priority to regions of a pedestrian and a bicycle on a sidewalk, as illustrated in (C) of the figure.
On the other hand, a pedestrian and a bicycle on a road surface are expected to be in a dangerous state. Thus, the update priority map generator 16 gives a high update priority to regions of a pedestrian and a bicycle on a road surface. Note that, in an update priority map illustrated in (C) of the figure and in subsequent figures, a darker gray indicates a higher update priority.
Moreover, the update priority map generator 16 may give a high update priority to a region of a boundary between a region of a sidewalk or a road surface and a region other than the region thereof, since the boundary region may be an out-of-sight location and another object may suddenly run out of the boundary region.
Further, the update priority map generator 16 is not limited to generating an update priority map on the basis of a relationship between attributes of two regions, and may generate an update priority map on the basis of a relationship between attributes of three or more regions.
For example, the update priority map generator 16 may give a high update priority to regions of a pedestrian and a bicycle that are situated around a region of an automobile on a road surface. The reason is that there is a possibility that the automobile will change its movement in order to avoid the pedestrian and the bicycle.
Further, the update priority map generator 16 may give a high update priority to a region in which pedestrians and bicycles on a road surface are close to each other. The reason is that there is a possibility that the pedestrian and the bicycle will change their movements in order to avoid another pedestrian and another bicycle.
Returning to
The update priority map generator 16 integrates the two update priority maps to generate an integration update priority map as illustrated in (D) of the figure. As a result of the integration, a high priority is given to a region in which regions respectively set in the two update priority maps overlap, due to degrees of priority in the respective update priority maps being combined.
Here, in the update priority map based on an unobserved region, the update priority map generator 16 may set, before the integration, a region slightly larger than a detected unobserved region, in order to improve the detection accuracy.
Further, in the update priority map based on a relationship between attributes of regions, the update priority map generator 16 may set, before the integration, a region larger than a region in which, for example, a pedestrian is detected, in order to cope with movement of the pedestrian.
Returning to
For example, when an update priority map illustrated in (A) of the figure is obtained, the region semantic-segmentation section 17 sets a rectangle circumscribed about a high-priority region, as illustrated in (B) of the figure, and performs semantic segmentation with respect to a region of the circumscribed rectangle.
As illustrated in (C) of the figure, the region semantic-segmentation section 17 performs semantic segmentation with respect to all of the regions of the set circumscribed rectangles when the region semantic-segmentation section 17 has determined, in consideration of computational resources, that no delay will occur even if processing is performed with respect to all of the circumscribed rectangles.
On the other hand, as illustrated in (D) and (E) of the figure, a region of a low update priority may be excluded from semantic-segmentation targets when it has been determined, in consideration of computational resources, that a delay will occur if processing is performed with respect to all of the circumscribed rectangles.
Returning to
As described above, according to the present embodiment, the integrated control unit 7600 of the vehicle control system 7000 does not equally perform recognition processing with respect to each acquired captured image (frame), but sets the frequency of performing semantic segmentation processing on the basis of an attribute of a region in the image. This makes it possible to eliminate redundant processing and reduce a quantity of computations.
The present technology is not limited to the embodiments described above, and various modifications may be made thereto without departing from the scope of the present technology.
In the embodiments described above, the region-attribute-relationship determination section 15 and the update priority map generator 16 set the update priority on the basis of a relationship between attributes of regions, but the update priority may be set on the basis of an attribute of each region itself. For example, a low update priority may be given to a region of a signal or a sign. In consideration of movement speed, a higher update priority may be given to a region of a bicycle, compared to a region of a pedestrian, and a higher update priority may be given to a region of an automobile, compared to the region of a bicycle.
Further, the update priority map generator 16 integrates an update priority map based on an unobserved region and an update priority map based on a relationship between attributes of regions to generate an update priority map used to perform semantic segmentation. In addition to the two update priority maps, or instead of one of the two update priority maps, the update priority map generator 16 may integrate an update priority map generated using another parameter.
The update priority map generator 16 may set the update priority according to the position of a region in a captured image.
For example, as illustrated in
Moreover, for example, the update priority map generator 16 may give a higher update priority to an upper portion of an image, compared to a lower portion of the image.
Further, the update priority map generator 16 may set the update priority according to the movement (traveling) speed of a vehicle and according to the position of a region in a captured image.
The case illustrated in, for example,
On the other hand, when the vehicle is moving at a low speed (traveling at a threshold speed of, for example, 30 km/h or less), the update priority map generator 16 gives a low update priority to the region of the center portion of the image, and gives a low update priority to a region of the end portion of the image, as illustrated in (C) of the figure. The reason is that, in this case, it is generally more important for the driver to look around the region in the surroundings than to look ahead.
Further, the update priority map generator 16 may set the update priority according to a distance (z) between a subject and a vehicle in a captured image.
For example, as illustrated in
When at least one of the update priority maps of
In the embodiments described above, the region semantic-segmentation section 17 does not perform semantic segmentation with respect to the entirety of a captured image, but only performs semantic segmentation with respect to a region set by the update priority map generator 16. However, the region semantic-segmentation section 17 may periodically perform semantic segmentation with respect to all of the regions of a captured image. This results in periodical complement covering an error caused by partial recognition processing performed for each region.
Further, the region semantic-segmentation section 17 may periodically perform the all-regions processing, and may permit a delay when semantic segmentation is performed with respect to limited regions selected according to the update priority, as illustrated in (C) of the figure. This results in delay, but processing can be performed with respect to all of the regions necessary to perform recognition when semantic segmentation is performed with respect to limited regions, without omitting processing due to computational resources.
Here, various kinds of triggers for performing the all-regions processing are conceivable.
The region semantic-segmentation section 17 may perform the all-regions processing when the proportion of the area of an unobserved region or unobserved regions (a region or regions onto which projection is not performed using a projection map) is equal to or greater than a predetermined proportion. When the area of an unobserved region or unobserved regions is large, there is a small difference in a quantity of computations between the all-regions processing and semantic segmentation performed with respect to limited regions. Thus, when the region semantic-segmentation section 17 performs the all-regions processing, this makes it possible to improve the recognition accuracy while suppressing an increase in a quantity of computations.
The region semantic-segmentation section 17 may perform the all-regions processing when a steering angle for a vehicle that is detected by the vehicle state detecting section 7110 is equal to or greater than a predetermined angle. It is conceivable that, when a large steering angle is detected, there will be a great change in image-capturing-target scenery and there will be an increase in unobserved region. Thus, when the region semantic-segmentation section 17 performs the all-regions processing in such a case, this makes it possible to eliminate a quantity of computations necessary to specially detect an unobserved region, and to improve the recognition accuracy.
The region semantic-segmentation section 17 may perform the all-regions processing when a vehicle is moving through a predetermined point. GPS information and map information that are acquired by the positioning section 7640 are used as positional information.
For example, the region semantic-segmentation section 17 may perform the all-regions processing when the region semantic-segmentation section 17 detects that a vehicle is traveling up or down a hill of which an inclination exhibits a value equal to or greater than a predetermined value. It is conceivable that, on a steeply inclined uphill or downhill, there will be a great change in image-capturing-target scenery and there will be an increase in unobserved region. Thus, when the region semantic-segmentation section 17 performs the all-regions processing in such a case, this makes it possible to eliminate a quantity of computations necessary to specially detect an unobserved region, and to improve the recognition accuracy.
Further, the region semantic-segmentation section 17 may perform the all-regions processing when a vehicle enters a tunnel or exits a tunnel, since there will also be a great change in image-capturing-target scenery in this case.
Furthermore, the region semantic-segmentation section 17 may perform the all-regions processing when the proportion of the area of a region or regions, in a captured image, in which a result of an attribute recognition performed by semantic segmentation being performed is less reliable, or the proportion of the area of a region or regions, in the captured image, of which an attribute is not recognized by semantic segmentation being performed, is equal to or greater than a predetermined proportion (for example, 50%).
In the embodiments described above, the region semantic-segmentation section 17 sets a rectangle circumscribed about a high-priority region, as illustrated in
In other words, when a convolution operation is performed on an input image multiple times to obtain a final semantic segmentation result (processing performed by following arrows in an upper portion), it is sufficient if an operation is performed only on a necessary region by following the reverse of the convolution operation (processing performed by following arrows in a lower portion), in order to calculate a region necessary for the final result, as illustrated in (A) of
Thus, when an update priority map illustrated in (B) of the figure is obtained, the region semantic-segmentation section 17 may perform a backward calculation to obtain a region that is necessary to obtain, as a final result, a high-priority region indicated by the update priority map, may set a semantic-segmentation-target region, as illustrated in (C) of the figure, and may perform semantic segmentation with respect to the set region.
In this case, the region semantic-segmentation section 17 may also exclude a low-priority region from semantic segmentation targets when it has been determined, in consideration of computational resources, that a delay will occur.
The example in which a vehicle (an automobile) is a mobile body on which the integrated control unit 7600 serving as an information processing apparatus is mounted, has been described in the embodiments described above. However, the mobile body on which an information processing apparatus that is capable of performing information processing similar to information processing performed by the integrated control unit 7600 is mounted, is not limited to a vehicle. For example, the information processing apparatus may be provided as an apparatus mounted on any kind of mobile body such as motorcycle, bicycle, personal mobility, airplane, drone, ship, robot, construction machinery, or agricultural machinery (a tractor). In this case, the relationship between attributes described above (such as a pedestrian, a vehicle, a road surface, and a sidewalk) is differently recognized according to the mobile body.
Further, a target on which the information processing apparatus described above is mounted is not limited to a mobile body. For example, the present technology is also applicable with respect to an image captured by a surveillance camera. In this case, the processing associated with movement of a vehicle that has been described in the embodiments described above, is not performed, but an image-capturing target may be changed with panning, tilting, and zooming being performed by a surveillance camera. Thus, the present technology is also applicable when an update priority map based on an unobserved region is generated, in addition to an update priority map based on the attributes of regions being generated.
[Others]
The present technology may also take the following configurations.
(1) An information processing apparatus, including:
an input device to which a captured image captured by a camera is input, the captured image including distance information for each pixel; and
a controller that
the controller
the controller
the controller sets a priority of performing the recognition processing according to a position of the non-associated pixel in the post-movement captured image.
(6) The information processing apparatus according to (5), in which
the controller sets the priority of performing the recognition processing for each non-associated pixel according to the position of the non-associated pixel in the post-movement captured image, and according to a movement speed of the mobile body.
(7) The information processing apparatus according to any one of (2) to (6), in which
the controller sets a priority of performing the recognition processing for each non-associated pixel according to the distance information of the non-associated pixel.
(8) An information processing method, including:
acquiring a captured image captured by a camera, the captured image including distance information for each pixel;
generating a transformed captured image obtained by transforming pairs of coordinates for respective pixels of the captured image on the basis of an amount of movement of the camera or a mobile body on which the camera is mounted;
associating a pair of coordinates for a pixel of the transformed captured image with a pair of coordinates for a pixel of a post-movement captured image captured at a position of the camera after the movement; and
identifying a non-associated pixel that is included in the post-movement captured image and is not associated with the pixel of the transformed captured image.
(9) A program that causes an information processing apparatus to perform a process including:
acquiring a captured image captured by a camera, the captured image including distance information for each pixel;
generating a transformed captured image obtained by transforming pairs of coordinates for respective pixels of the captured image on the basis of an amount of movement of the camera or a mobile body on which the camera is mounted;
associating a pair of coordinates for a pixel of the transformed captured image with a pair of coordinates for a pixel of a post-movement captured image captured at a position of the camera after the movement; and
identifying a non-associated pixel that is included in the post-movement captured image and is not associated with the pixel of the transformed captured image.
Number | Date | Country | Kind |
---|---|---|---|
2019-062942 | Mar 2019 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2020/011153 | 3/13/2020 | WO | 00 |