The present disclosure relates to an information processing device, an information processing method, and an information processing program.
There is a conventional technique of generating information such as an image at a time point ahead (in the future) based on information obtained until the processing time point. For example, in order to reduce a temporal difference between an actual event and a displayed image, there is provided a technique of generating first predicted image information at a second time point, which is later than a first time point, by using first image information at the first time point (for example, Patent Literature 1).
Patent Literature 1: JP 2020-048185 A
However, there is room for improvement in the conventional technology. For example, the conventional technology directly uses the first image information itself at the first time point to generate the first predicted image at the second time point, and there is room for improvement in terms of information used in information generation. Accordingly, efficient generation of information is desired.
In view of this, the present disclosure proposes an information processing device, an information processing method, and an information processing program capable of performing efficient generation of information.
According to the present disclosure, an information processing device includes an acquisition means that acquires first information which is part of information among information regarding a target region; and a generation means that generates, based on the first information acquired by the acquisition means, second information corresponding to a second time which is a time later than a first time which is a time corresponding to the first information.
Embodiments of the present disclosure will be described below in detail with reference to the drawings. Note that the information processing device, the information processing method, and the information processing program according to the present application are not limited by the embodiments. Moreover, in each of the following embodiments, the same parts are denoted by the same reference symbols, and a repetitive description thereof will be omitted.
The present disclosure will be described in the following order.
First, prior to the description of each embodiment, points common to each embodiment will be described with reference to
Hereinafter, an outline of information processing of the present disclosure will be described with reference to
For example, an imaging delay (latency) occurs in the second stage PS2. In the example of
In this manner, a delay (latency) occurs during a period from the imaging system to the display system, leading to an occurrence of a time lag from reality. As a result, for example, in a case where the display system is a display, a temporal deviation occurs between the timing at which information is perceived by human eyes and the real world. For example, in a case where the frame data is processed at a rate of 60 frames per second (fps), there occurs a delay corresponding to at least one frame or more. In use cases requiring quick action such as sports and motion detection, this delay (lag) can have a large influence. Furthermore, even if not, the delay is assumed to have a great influence on the usability or use comfort in the user.
In view of these, information processing of the present disclosure as depicted in
In the example depicted in
In this manner, the information processing of the present disclosure generates the image FD1 corresponding to time point t21 at time point t12 which is before time point t21 in the third stage PS13, making it possible to output the image FD1 at time point t21 in the fourth stage PS14 of the display system output. That is, the information processing of the present disclosure makes it possible to realize processing (system) in which data seems to be generated at the same timing as in the real world when observed through the display system output, and express the real-time world.
Furthermore, the real-time world as described above is realized by predicting future data using past data. For example, it is realized by acquiring information with higher density in the time direction than frame data of conventional technologies. Therefore, an imaging system with such a function may be used in the information processing of the present disclosure. This makes it possible to generate a new added value by utilizing inter-frame information that cannot be obtained with frame data in conventional technologies. For example, an example of such an imaging system includes an Event-based Vision Sensor (EVS). EVS is also referred to as a Dynamic Vision Sensor (DVS).
Here, EVS will be described. For example, EVS images a predetermined imaging range through a lens. EVS is an asynchronous image sensor in which a plurality of pixels having photoelectric conversion elements is two-dimensionally arranged and a detection circuit that detects an address event in real time is provided for each pixel. An address event is an event that occurs for each address assigned to each of the plurality of pixels arranged in a two-dimensional pattern, and corresponds to an event in which a value of a current based on a charge generated in the photoelectric conversion element or a change amount of the value exceeds a certain threshold.
The EVS detects the presence or absence of occurrence of an address event for each pixel. When having detected occurrence of an address event, the EVS reads a pixel signal from the pixel in which the address event has occurred as pixel data.
EVS executes a pixel signal read operation for the pixel in which the occurrence of the address event has been detected. Accordingly, EVS is capable of performing reading at a much higher speed than that in a synchronous image sensor in which the read operation is executed at a predetermined frame rate for all the pixels, with small data volume to be read out as one frame.
Therefore, an information processing device 100 (refer to
Furthermore, as depicted in
Here, situations such as background of performing the information processing of the present disclosure as described above will be described. Technologies related to virtual reality (VR) and augmented reality (AR) have been developed, together with expansions of markets of robot operation by remote control and games using AR glasses. The currently developing technologies include a study of mixed reality (MR) in which interactive elements are added in the AR space, leading to a possibility of disappearance of a border between the reality and the virtual space in the near future. One of important techniques for realizing a borderless state with no boundary between the virtual space and the real space would be real-time processing in a system or the like.
On the other hand, as depicted in
In view of this, information processing of the present disclosure predicts future data with respect to stream data on the time axis, making it possible to generate a new image having a higher resolution than the frame synchronization signal from the original data so as to output real-time data. That is, information processing of the present disclosure implements processing of predicting future data with respect to stream data on the time axis, generating a new image having a higher resolution than the frame synchronization signal from the original data, and outputting real-time data. The output means in the information processing of the present disclosure is not limited to the display, and may be an actuator or the like. For example, the information processing of the present disclosure makes it possible to support user reactions (reaction speed of person, delay in actuator of a robot). Furthermore, the information processing of the present disclosure makes it possible to perform real-time rendering on a moving subject by MR.
Hereinafter, each embodiment will be described based on the premise described above. The following exemplary embodiments are based on the information processing (the concept thereof) described in
In the example of
Subsequently, the input means 101 passes the current information to the future information generation means 102. Using the current information, the future information generation means 102 generates information for displaying an image of the second time later than the first time as the future information. In
When generating the future information, the future information generation means 102 uses information provided by the past information generation means 103 (also referred to as past information). The past information generation means 103 provides, as the past information, past information that has been already processed, such as generated Encode data generated in the past, to the future information generation means 102. By recursively using the past information already processed, the future information generation means 102 can generate future information by continuous information with high time resolution.
Subsequently, the future information generation means 102 passes the future information to the signal output means 104. The signal output means 104 converts the future information into a format according to the output mode. For example, the signal output means 104 converts the format of data according to an application, such as image data or a signal for driving an actuator of a robot. In
Next, a configuration of the information processing device 100 which is an example of an information processing device that executes information processing according to the first embodiment will be described.
Note that any aspect can be adopted as a component as hardware for implementing the information processing device 100. For example, the information processing device 100 may be implemented as a module in any form such as a chip. The stacked layers may be formed with TSV, microbump, Cu—Cu connection, or the like. In a case where the information processing device 100 is realized as a chip, the chip may be stacked with three or more layers. For example, in a case where components such as the input means 101, the future information generation means 102, the signal output means 104 are mounted on a chip, the chip may be stacked with three or more layers. In addition, any configuration such as a through-silicon via (TSV), a microbump, or a Cu—Cu connection can be adopted for the stack.
As depicted in
The input means 101 is implemented by adopting a configuration corresponding to target information. For example, in a case where target information is an image, the input means 101 is implemented by adopting a configuration related to a device such as a sensor that detects an image. The sensor used as the input means 101 is not limited to an image sensor, and may be a sensor that detects acceleration, angular velocity, sound, or vibration.
For example, the input means 101 may be a time of flight (TOF) sensor, an image sensor, or the like, or may be a combination thereof. The input means 101 may be implemented by using EVS. In addition, the image sensor used as the input means 101 may be a fully electrified transfer method, a pulse width modulation (PWM) pixel, or an organic or inorganic photoelectric conversion film. The pixel of the image sensor may adopt any configuration. For example, the pixel may have a configuration in which one photodiode (PD) is connected to a FD (charge detector), or two or more PDs may be connected to the FD. In addition, as an example of a PTW pixel, a slope signal enters variable resistance (VR), and K_RST is reset and selected by RSEL, making it possible to generate a PTW signal according to a charge amount generated from the PD. The photoelectric conversion film of the PD may be organic or inorganic.
Furthermore, the pixel transistor (Tr) of the image sensor may be a transistor such as a Metal-Oxide-Semiconductor Field Effect Transistor (MOSFET), an Indium-Gallium-Zinc Oxider Field Effect Transistor (IGZOFET), a Tunnel Field Effect Transistor (TFET), and an organic Field Effect Transistor (organic FET). In addition, the chip on which the pixel array is arranged is not limited to the back-illuminated type, and may be a front-illuminated type. In addition, information output from the input means 101 is not limited to two-dimensional space data, and may be three-dimensional space data. Note that information indicating time (time information) is assigned to the information output by the input means 101.
The analog-to-digital converter (ADC) may be a single-slope ADC, a double integration type, a successive approximation register (SAR), a delta sigma converter, or the like. The resolution may be 1 b to 12 bits, or the like. Alternatively, it is allowable to use a method such as EVS which detects a difference between pixel signals. For example, the ADC may be an EVS type column ADC.
The computation means 110 is implemented by a computation device. The computation means 110 may include a digital signal processor (DSP) as a computation device, and the DSP may be included in the same chip, may be stacked, or may be provided outside the chip. The computation means 110 may include, as a computation device, a computing in memory (CiM) type device constituted with a memory array. Note that a configuration example of the CiM is depicted in
Note that the above is merely an example, and any configuration of the computation means 110 can be adopted as long as desired processing can be implemented. For example, the computation means 110 may be implemented by an integrated circuit such as an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA). The computation means 110 may be implemented by execution of a program (for example, an information processing program according to the present disclosure) stored in the information processing device 100 by various processors such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a Micro Processing Unit (MPU) by using the storage section as a work area. The computation means 110 also functions as a control means.
As depicted in
The acquisition means 111 acquires various types of information. The acquisition means 111 acquires information from the input means 101. The acquisition means 111 acquires first information, which is part of information regarding a target region. The acquisition means 111 acquires the first information, which is part of the information regarding the target region detected by the sensor. The acquisition means 111 acquires the first information, which is information regarding a portion necessary for processing by the future information generation means 102 among the information regarding the target region. The acquisition means 111 acquires the first information, which is information regarding a portion including a change in the first time among the information regarding the target region.
The acquisition means 111 acquires the first information, which is sparse with respect to the information regarding the target region. The acquisition means 111 acquires the first information sparsely from the target region. The acquisition means 111 acquires the first information corresponding to a plurality of portions discretely located in the target region. The acquisition means 111 acquires the first information corresponding to a portion of a predetermined ratio or less in the target region. For example, the acquisition means 111 acquires the first information corresponding to a half (50%) or less of the target region. The predetermined ratio may be appropriately set. For example, the acquisition means 111 acquires the first information corresponding to a portion of a predetermined ratio or less according to the application. In a case where the information processing device 100 is installed on an automobile and images and recognizes a traffic light, the acquisition means 111 acquires the first information corresponding to 30% or less of the image.
The acquisition means 111 acquires the first information corresponding to a part of the information regarding the target region which is an image. The acquisition means 111 acquires first information corresponding to a pixel having a change in the image.
The future information generation means 102 executes processing of generating future information. Based on the first information acquired by the acquisition means 111, the future information generation means 102 generates second information corresponding to the second time, which is later than the first time being the time corresponding to the first information. Based on the first information, the future information generation means 102 generates the second information predicted to be detected by the sensor in the second time.
The future information generation means 102 generates second information which is encoded data corresponding to the second time. The future information generation means 102 generates the second information based on the first information and the third information corresponding to the third time including time information earlier than the first time.
The future information generation means 102 generates the second information by using the third information generated before the generation of the second information. The future information generation means 102 generates the second information corresponding to the image at the second time based on the first information. The future information generation means 102 generates the second information corresponding to the image at the second time based on the first information.
Inference time for generating future information by the future information generation means 102 is to be equal to or shorter than the delay of the input means 101 and the delay of the signal output means 104 (also referred to as “reference time”). The future information generation means 102 generates future information in a time equal to or less than the delay time of the input means 101 and the delay time of the signal output means 104. The future information generation means 102 may use a prediction model that uses information regarding a pixel having a change as an input and outputs information indicating a pixel estimated to have a change thereafter. The future information generation means 102 may use a prediction model that uses information regarding a pixel having a change in the first time as an input and outputs information indicating a pixel estimated to have a change in the second time. For example, the future information generation means 102 may generate the future information using a prediction model trained so that the time from the input to the output is less than the reference time.
The past information generation means 103 executes processing of generating past information. The past information generation means 103 generates past information based on the various types of information acquired by the acquisition means 111.
The signal output means 104 executes various types of processing related to output using the information generated by the future information generation means 102. The signal output means 104 outputs an image using the information generated by the future information generation means 102. In this case, the signal output means 104 is a display, AR glasses, a retinal scan display, or the like. For example, the signal output means 104 has a function of a display device. The signal output means 104 is not limited to the above, and may be any output device according to an output mode, such as an actuator of a robot. That is, the signal output means 104 can adopt any configuration as long as it has a function of converting data received from the future information generation means 102 or the like into some form and outputting the converted data.
Next, a procedure of various types of information processing will be described with reference to
As depicted in
Next, a second embodiment will be described with reference to
Hereinafter, a configuration corresponding to the configuration in the first embodiment will be described with “A” added to the end of the reference numeral, and points different from the first embodiment will be described below. Configurations similar to those of the first embodiment are denoted by the same reference numerals as those of the first embodiment, and description thereof will be omitted as appropriate. That is, in the second embodiment, “***” is replaced with “*** A” for the configuration similar to the configuration in the first embodiment. For example, a future information generation means 102A in the second embodiment has a configuration similar to the future information generation means 102 in the first embodiment, and the description other than their difference will be appropriately omitted.
In
When generating the future information, the future information generation means 102A uses information (also referred to as past information) provided by a past information generation means 103A. The past information generation means 103A provides, as the past information, past information that has been already processed, such as combined Encode data generated by the combining means 106 in the past, for example, to the future information generation means 102A. By recursively using the past information already processed, the future information generation means 102A can generate future information by continuous information with high time resolution.
In this manner, in the information processing device 100A, the future information generation means 102A can detect a difference between the real world and the generation information by adding current information when generating the future information. With this operation, the information processing device 100A can correct parameters used for computation in real time at a constant timing.
Subsequently, the future information generation means 102A passes the future information to the combining means 106. The combining means 106 executes combining processing of combining data output from the future information generation means 102A with data output from the current information generation means 105.
The combining means 106 uses the future information and the combining-oriented current information to generate, as the combined information, information for displaying an image of the second time, which is later than the first time. In
Subsequently, the combining means 106 passes the combined information to the signal output means 104. The signal output means 104 converts the combined information into a format according to the output mode. For example, the signal output means 104 displays an image corresponding to the second time by using information (event information) indicating a portion predicted to have a change in the predetermined region in the second time.
Next, a configuration of an information processing device according to a second embodiment will be described.
Similarly to the computation means 110, the computation means 110A is implemented by a device such as a computation device or a processor. As depicted in
The current information generation means 105 executes processing of generating current information. Using the first information, the current information generation means 105 generates fourth information to be used for combining with the second information. The current information generation means 105 generates fourth information, which is encoded data corresponding to the first time.
The combining means 106 executes processing of combining information. The combining means 106 combines the second information with other information to generate combined information used for output. The combining means 106 combines the second information with the fourth information to generate combined information.
Next, a third embodiment will be described with reference to
Hereinafter, the configuration corresponding to the configuration in the second embodiment is denoted by “B” attached to the end of the reference numeral. In the third embodiment, the configuration corresponding to the second embodiment is denoted by “*** B” instead of the reference numeral “***” or “***A” in the second embodiment, and the description similar to the point described above is appropriately omitted. For example, a sensor 101B in the third embodiment is a specific example of the input means 101 in the second embodiment, and the input means 101 is implemented by the sensor 101B.
The past information generation means in an information processing device 100B includes a learning model 103B created using past information in addition to a function of recursively using data converted into past information among the data generated by a computation device 102B corresponding to the future information generation means 102A. In this manner, the information processing device 100B uses the learning model 103B trained using past information. The learning model 103B is implemented as a computation parameter of the computation device 102B for generating future information. In the information processing device 100B, the computation device 102B, which is the future information generation means, generates future information, which is the second information, by using the learning model 103B updated by using the past information, which is the third information. The learning model 103B may be any information as long as it is a model used to generate future information. For example, the learning model 103B may be a prediction model that uses information regarding a pixel having a change as an input and outputs information indicating a pixel estimated to have a change thereafter.
The sensor 101B depicted in
In addition, the learning model 103B is an implementation example of the past information generation means 103 and 103A. The past information generation means 103 and 103A may be a multiply-accumulator, a learning model, or the like. In addition, a decoder 104B which is a display device or an actuator is an implementation example of the signal output means 104. Furthermore, a computation device 105B is an implementation example of the current information generation means 105. Furthermore, an Encode data combining device 106B corresponds to the combining means 106, and the Encode data combining device 106B may be implemented by any computation device, a combining circuit, or the like.
Next, a fourth embodiment will be described with reference to
An information processing device 100C updates a learning model 103C in real time with past information. The information processing device 100C updates the computation parameter of the learning model 103B using the past information. For example, every time the future information is generated, the information processing device 100C may perform feedback of generated future information as past information and may update the computation parameter of the learning model 103B. For example, the information processing device 100C may update the computation parameter of the learning model 103B using the past information and the ground truth information corresponding to information of the real world detected by the sensor at the time (time point) corresponding to the past information. In this case, the information processing device 100C updates the computation parameter of the learning model 103B such that the past information approaches the ground truth information. In this manner, the information processing device 100C can update the learning model in real time with the past information, leading to enhancement of computation accuracy. The above description is merely an example. Not merely every timing of generating the future information, the information processing device 100C may perform feedback of generated future information at a predetermined timing as past information and may update the computation parameter of the learning model 103B. For example, in a case where the number of times of generation of the future information reaches a predetermined number of times, the information processing device 100C may perform feedback of the future information generated so far as the past information, and may update the computation parameter of the learning model 103B.
Next, a fifth embodiment will be described with reference to
In an information processing device 100D, computation result information (current Encode data) obtained by a computation device 105D being the current information generation means is directly passed to a computation device 102D being the future information generation means. The computation device 102D generates the future information using the current information and the current Encode data. With this operation, the information processing device 100D can increase the computation accuracy. The computation device 102D may generate future information by using a prediction model that uses the current information and the current Encode data as input and outputs information indicating a pixel estimated to have a change thereafter. Note that the above is merely an example, and the computation device 102D may generate the future information by any type of processing as long as the current information and the current Encode data are used.
Next, a sixth embodiment will be described with reference to
In an information processing device 100E, data (current information) from a sensor 101E which is an input means, that is, the data before subjected to computation by a computation circuit 102E being the future information generation means, is directly used for updating a learning model 103E. The information processing device 100E updates the computation parameter of the learning model 103E using the current information. For example, the information processing device 100E may update the computation parameter of the learning model 103B using the current information and the past information of the time (time point) corresponding to the current information. For example, the information processing device 100E may update the computation parameter of the learning model 103B using the ground truth information based on the current information and using the past information. In this case, the information processing device 100E updates the computation parameter of the learning model 103B such that the past information approaches the ground truth information. In the information processing device 100E, the computation circuit 102E, which is the future information generation means, generates future information, which is the second information, by using the learning model 103E updated by using the current information, which is the first information. In this manner, the information processing device 100E can update the learning model in real time with the current information, leading to enhancement of computation accuracy. That is, the information processing device 100E can perform real-time generation of the learning model 103E capable of further improving the computation accuracy.
It is possible to adopt any combination of the elements of the first to sixth embodiments. For example, any one of the elements, a plurality of elements, or all of the elements of the second to sixth embodiments described above may be added to the first embodiment.
Here, a specific example of each configuration of an information processing device 100E depicted in
The sensor 101E, which is an input means, can be implemented by adopting various sensors such as a CMOS image sensor (CIS), an EVS, a TOF, an inertial measurement unit (IMU), and a microphone. Note that the sensor includes a reading circuit and a control circuit.
Furthermore, the computation circuit 102E, which is the future information generation means, can be implemented by adopting a computation circuit such as a digital circuit, an analog circuit, or CiM. For example, CiM has a configuration as depicted in
For example, the CiM may be a multiply-accumulator that arranges assumable variable resistance elements such as RRAM on a crossbar array so as to implement analog multiply-accumulation. For example, the CiM may be a multiply-accumulator that implements analog multiply-accumulation with multi-value memory actualized using phase-change memory (PCM), arranged on a crossbar array. For example, the CiM may use magnetic memory (MRAM or the like) instead of the RRAM or the PCM. For example, the CiM may be a device having a configuration in which a FeFET and a memory cell having a high resistance element on its drain side, are arranged on a crossbar array.
For example, the CiM may be a multiply-accumulator having both digital and analog configuration, which employs a method in which the multiplication is performed by an XNOR circuit using SRAM, the result is charged into the capacitor, and the result of the multiplication of each cell is accumulated. For example, the CiM may be a method of accessing and reading a plurality of desired FETs using the 3D NAND so as to implement the multiple-accumulation by accumulating the readout current. For example, the CiM may be implemented by a multiply-accumulator in which NOR Flash cells are arranged on an array. For example, the CiM may be implemented by a multiply-accumulator in which charge trapping transistors (CTTs) are arranged on an array. Note that the above is merely an example, and the CiM may be implemented by any configuration.
Furthermore, the learning model 103E, which is the past information generation means, can by implemented by adopting a digital circuit, an analog circuit, or the like. For example, the learning model 103E may use a signal processing circuit or the like that performs addition, division, or multiplication for learning model update on a small scale without performing complicated computations.
In addition, a decoder 104E, which is a signal output means, can be implemented by adopting a display, a head-mounted display (HMD), or the like. Furthermore, a computation device 105E, which is the current information generation means, can be implemented by adopting a computation circuit such as a digital circuit, an analog circuit, or CiM. Note that the computation circuit includes a reading circuit and a control circuit.
In addition, an Encode data combining device 106E, which is a combining means, can be implemented by adopting a digital circuit, an analog circuit, or the like. Note that the Encode data combining device 106E may use a signal processing circuit or the like that performs addition, division, or multiplication of data on a small scale without performing complicated computations.
Furthermore, in
Next, a seventh embodiment will be described with reference to
The parallel signal input means 211 in the information processing device 200 corresponds to a retina of a human. In addition, the past information storage means 221 and 231 of the information processing device 200 correspond to the thalamus of a human. In addition, the information processing device 200 includes two parallel signal computation means 222 and 232. The parallel signal computation means 222 corresponds to the lateral geniculate nucleus, while the parallel signal computation means 232 corresponds to the neocortex. The human brain is said to have a computation device of a two-layer structure, and signals are output using past information, learning models 223 and 233, and input signals.
The input signal used for the operation is substantially 25-20% of all signals, and for example, in the second computation layer 230 of the second layer, about 75% of the computation result of the first computation layer 220 of the first layer is used as an input. It is also possible to configure the information processing device 200 (system) in a form imitating such a computation system. In the example of
Here, an example of a configuration of the information processing device 200 described above will be described with reference to
As depicted in
The output layer 240 functioning as output means of the information processing device 200 may use AR glasses or the like. Furthermore, the control layer 250 functioning as a control means in the information processing device 200 may use a control circuit or the like that controls coefficients of the individual child cells Wxx of the first computation layer 220, the second computation layer 230, and the like.
Examples of the application will be described below. Hereinafter, a case where the information processing device 100 is applied to an application will be described as an example, in which the information processing devices 100A to 100E may be used, not limited to the information processing device 100.
First, an example of using the information processing device 100 for information processing such as control of a robot 10 depicted in
For example, he information processing device 100 can suppress the influence of a delay in information processing in operations such as control of the robot 10. In
Next, an example of using the information processing device 100 for information processing such as display using AR glasses 20 depicted in
For example, the information processing device 100 can suppress the influence of a delay in information processing such as display of the AR glasses 20. In
Note that the above is merely an example, and the information processing device 100 may be used in various applications. For example, the information processing device 100 may be applied to real-time remote activity via a robot. This enables real-time remote control of the remotely located robot.
Furthermore, the information processing device 100 may be applied to real-time information feedback such as motion analysis in sports. This also makes it possible to obtain real-time activity feedback.
In addition to the above applications, the technology according to the present disclosure (the present technology) can be applied to various products. For example, the technology according to the present disclosure may be applied to devices mounted on any of mobile body such as automobiles, electric vehicles, hybrid electric vehicles, motorcycles, bicycles, personal mobility, airplanes, drones, ships, and robots.
A vehicle control system 12000 includes a plurality of electronic control units connected to each other via a communication network 12001. In the example depicted in
The driving system control unit 12010 controls the operation of devices related to the driving system of the vehicle in accordance with various kinds of programs. For example, the driving system control unit 12010 functions as a control device for a driving force generating device for generating the driving force of the vehicle, such as an internal combustion engine, a driving motor, or the like, a driving force transmitting mechanism for transmitting the driving force to wheels, a steering mechanism for adjusting the steering angle of the vehicle, a braking device for generating the braking force of the vehicle, and the like.
The body system control unit 12020 controls the operation of various kinds of devices provided to a vehicle body in accordance with various kinds of programs. For example, the body system control unit 12020 functions as a control device for a keyless entry system, a smart key system, a power window device, or various kinds of lamps such as a headlamp, a backup lamp, a brake lamp, a turn signal, a fog lamp, or the like. In this case, radio waves transmitted from a mobile device as an alternative to a key or signals of various kinds of switches can be input to the body system control unit 12020. The body system control unit 12020 receives these input radio waves or signals, and controls a door lock device, the power window device, the lamps, or the like of the vehicle.
The outside-vehicle information detecting unit 12030 detects information about the outside of the vehicle including the vehicle control system 12000. For example, the outside-vehicle information detecting unit 12030 is connected with an imaging section 12031. The outside-vehicle information detecting unit 12030 makes the imaging section 12031 image an image of the outside of the vehicle, and receives the imaged image. based on the received image, the outside-vehicle information detecting unit 12030 may perform processing of detecting an object such as a human, a vehicle, an obstacle, a sign, a character on a road surface, or the like, or processing of detecting a distance thereto.
The imaging section 12031 is an optical sensor that receives light, and which outputs an electric signal corresponding to a received light amount of the light. The imaging section 12031 can output the electric signal as an image, or can output the electric signal as information about a measured distance. In addition, the light received by the imaging section 12031 may be visible light, or may be invisible light such as infrared rays or the like.
The in-vehicle information detecting unit 12040 detects information about the inside of the vehicle. The in-vehicle information detecting unit 12040 is, for example, connected with a driver state detecting section 12041 that detects the state of a driver. The driver state detecting section 12041, for example, includes a camera that images the driver. based on detection information input from the driver state detecting section 12041, the in-vehicle information detecting unit 12040 may calculate a degree of fatigue of the driver or a degree of concentration of the driver, or may determine whether the driver is dozing.
The microcomputer 12051 can computes a control target value for the driving force generating device, the steering mechanism, or the braking device based on the information about the inside or outside of the vehicle which information is obtained by the outside-vehicle information detecting unit 12030 or the in-vehicle information detecting unit 12040, and output a control command to the driving system control unit 12010. For example, the microcomputer 12051 can perform cooperative control intended to implement functions of an advanced driver assistance system (ADAS) which functions include collision avoidance or shock mitigation for the vehicle, following driving based on a following distance, vehicle speed maintaining driving, a warning of collision of the vehicle, a warning of deviation of the vehicle from a lane, or the like.
In addition, the microcomputer 12051 can perform cooperative control intended for automated driving, which allows the vehicle to travel autonomously without depending on the operation of the driver, or the like, by controlling the driving force generating device, the steering mechanism, the braking device, or the like based on the information about the outside or inside of the vehicle which information is obtained by the outside-vehicle information detecting unit 12030 or the in-vehicle information detecting unit 12040.
In addition, the microcomputer 12051 can output a control command to the body system control unit 12020 based on the information about the outside of the vehicle which information is obtained by the outside-vehicle information detecting unit 12030. For example, the microcomputer 12051 can perform cooperative control intended to prevent a glare by controlling the headlamp so as to change from a high beam to a low beam, for example, in accordance with the position of a preceding vehicle or an oncoming vehicle detected by the outside-vehicle information detecting unit 12030.
The sound/image output section 12052 transmits an output signal of at least one of a sound and an image to an output device capable of visually or auditorily notifying information to an occupant of the vehicle or the outside of the vehicle. In the example of
In
The imaging sections 12101, 12102, 12103, 12104, and 12105 are, for example, disposed at positions on a front nose, sideview mirrors, a rear bumper, and a back door of the vehicle 12100 as well as a position on an upper portion of a windshield within the interior of the vehicle. The imaging section 12101 provided to the front nose and the imaging section 12105 provided to the upper portion of the windshield within the interior of the vehicle obtain mainly an image of the front of the vehicle 12100. The imaging sections 12102 and 12103 provided to the sideview mirrors obtain mainly an image of the sides of the vehicle 12100. The imaging section 12104 provided on the rear bumper or the back door mainly acquires an image behind the vehicle 12100. The images in front acquired by the imaging sections 12101 and 12105 are mainly used for detecting a preceding vehicle or a pedestrian, an obstacle, a traffic light, a traffic sign, a lane, or the like.
Incidentally,
At least one of the imaging sections 12101 to 12104 may have a function of obtaining distance information. For example, at least one of the imaging sections 12101 to 12104 may be a stereo camera constituted of a plurality of imaging elements, or may be an imaging element having pixels for phase difference detection.
For example, the microcomputer 12051 can determine a distance to each three-dimensional object within the imaging ranges 12111 to 12114 and a temporal change in the distance (relative speed with respect to the vehicle 12100) based on the distance information obtained from the imaging sections 12101 to 12104, and thereby extract, as a preceding vehicle, a nearest three-dimensional object in particular that is present on a traveling path of the vehicle 12100 and which travels in substantially the same direction as the vehicle 12100 at a predetermined speed (for example, equal to or more than 0 km/hour). Further, the microcomputer 12051 can set a following distance to be maintained in front of a preceding vehicle in advance, and perform automatic brake control (including following stop control), automatic acceleration control (including following start control), or the like. It is thus possible to perform cooperative control intended for automated driving that allows the vehicle to travel autonomously without depending on the operation of the driver or the like.
For example, the microcomputer 12051 can classify three-dimensional object data on three-dimensional objects into three-dimensional object data of a two-wheeled vehicle, a standard-sized vehicle, a large-sized vehicle, a pedestrian, a utility pole, and other three-dimensional objects based on the distance information obtained from the imaging sections 12101 to 12104, extract the classified three-dimensional object data, and use the extracted three-dimensional object data for automatic avoidance of an obstacle. For example, the microcomputer 12051 identifies obstacles around the vehicle 12100 as obstacles that the driver of the vehicle 12100 can recognize visually and obstacles that are difficult for the driver of the vehicle 12100 to recognize visually. Then, the microcomputer 12051 determines a collision risk indicating a risk of collision with each obstacle. In a situation in which the collision risk is equal to or higher than a set value and there is thus a possibility of collision, the microcomputer 12051 outputs a warning to the driver via the audio speaker 12061 or the display section 12062, and performs forced deceleration or avoidance steering via the driving system control unit 12010. The microcomputer 12051 can thereby assist in driving to avoid collision.
At least one of the imaging sections 12101 to 12104 may be an infrared camera that detects infrared rays. The microcomputer 12051 can, for example, recognize a pedestrian by determining whether or not there is a pedestrian in imaged images of the imaging sections 12101 to 12104. Such recognition of a pedestrian is, for example, performed by a procedure of extracting characteristic points in the imaged images of the imaging sections 12101 to 12104 as infrared cameras and a procedure of determining whether or not it is the pedestrian by performing pattern matching processing on a series of characteristic points representing the contour of the object. When the microcomputer 12051 determines that there is a pedestrian in the imaged images of the imaging sections 12101 to 12104, and thus recognizes the pedestrian, the sound/image output section 12052 controls the display section 12062 so that a square contour line for emphasis is displayed so as to be superimposed on the recognized pedestrian. The sound/image output section 12052 may also control the display section 12062 so that an icon or the like representing the pedestrian is displayed at a desired position.
Hereinabove, an example of the vehicle control system to which the technology according to the present disclosure is applicable has been described. The technology according to the present disclosure can be suitably applied to the integrated control unit 12050 among the configurations described above. Specifically, in a case where the information processing device 100 depicted in
The above application is merely an example, and each configuration of the information processing device 100 may be applied as various configurations of the vehicle control system 12000. For example, the computation means 110 may be applied to the outside-vehicle information detecting unit 12030 or the in-vehicle information detecting unit 12040. By applying the technology according to the present disclosure to the outside-vehicle information detecting unit 12030 or the in-vehicle information detecting unit 12040 in this manner, it is possible to suppress occurrence of delay, leading to achievement of detection with high real-time property. In this case, for example, the input means 101 may be applied to the imaging section 12031 or the driver state detecting section 12041. For example, the signal output means 104 may be applied to the driving system control unit 12010 or the body system control unit 12020. By applying the technology according to the present disclosure to the driving system control unit 12010 or the body system control unit 12020, it is possible to suppress occurrence of delay, leading to achievement of control with high real-time property. Furthermore, for example, the signal output means 104 may be applied to the audio speaker 12061, the display section 12062, or the instrument panel 12063. Furthermore, by applying the technology according to the present disclosure to the audio speaker 12061, the display section 12062, or the instrument panel 12063 in this manner, it is possible to suppress occurrence of delay, leading to achievement of output with high real-time property.
The process according to each of embodiments described above may be performed in various different forms (modifications) in addition to each of embodiments or modifications described above.
Furthermore, among each processing described in the above embodiments, all or a part of the processing described as being performed automatically can be manually performed, or the processing described as being performed manually can be performed automatically by a known method. In addition, the processing procedures, specific names, and information including various data and parameters depicted in the above specifications or drawings can be changed in any manner unless otherwise specified. For example, a variety of information depicted in each of the drawings are not limited to the information depicted.
In addition, each of components of each device is provided as a functional and conceptional illustration and thus does not necessarily need to be physically configured as depicted. That is, the specific form of distribution/integration of each device is not limited to those depicted in the drawings, and all or a part thereof may be functionally or physically distributed or integrated into arbitrary units according to various loads and use conditions.
Furthermore, the above-described embodiments and modifications can be appropriately combined within a range implementable without contradiction of processing.
Here, the information processing device 100 will be described as an example, but application may be performed to any of the above. For example, the information processing device 100 includes an input means 101, at least one or more future information generation means 102, and an output means that outputs arbitrary information. Regarding future information, the future information is generated from at least one or more past information generation means 103. Furthermore, for example, the information processing device 100 has a structure including at least one or more current information generation means and in which arbitrary information is output from the combining means by the output means. Regarding future information, the future information is generated from at least one or more past information generation means 103 and at least one or more current information means.
For example, the past information generation means 103 includes a combination of at least two or more generation means of past information generation from one or more future generation means and one or more current information generation means, and past information generation stored in advance in the storage means. For example, regarding the past information stored in the storage means, the past information is updated using an information update means generated from one of the current information generation means and the future information generation means 102. For example, regarding the past information stored in the storage means, the past information is updated using an information update means generated from both of the current information generation means and the future information generation means 102.
For example, the amount of past information in the storage means decreases with time. The reduction coefficient of the past information amount has at least two types of time constants. For example, the past information of the storage means includes stored content with the information amount decreasing with time and stored content with the information not decreasing until the next update.
For example, the input means 101 is spatially sparse and dense in time. For example, the input means 101 is an EVS. For example, the time for information generation by the future information generation means 102 has a delay equal to or less than the delay of the input means 101 and the signal output means 104. For example, the computation means 110 computes only data that changes in time series. For example, the computation means 110 executes computation by using all the inter-frame information within the delay of the imaging system or the display system.
For example, the input means 101 includes an output signal from one or more low-resolution high-speed ADCs and an output signal from one or more high-resolution low-speed ADCs. For example, the input means 101 includes physical signal detection means created individually. For example, the input means 101 includes at least two or more detection means created in an identical detection device. For example, the information processing device 100 has a recursive structure from an information path to the past information generation means 103 to an information path to the future information generation means 102 in a read pipeline path from the input means 101 to the output means.
The effects described in the present specification are merely examples, and thus, there may be other effects, not limited to the exemplified effects.
As described above, the information processing device (for example, this corresponds to the information processing device 100 in the first embodiment) according to the present disclosure includes the acquisition means (for example, this corresponds to the acquisition means 111 in the first embodiment) and the generation means (for example, this corresponds to the future information generation means 102 in the first embodiment). The acquisition means acquires first information (for example, this corresponds to current information in the first embodiment) which is part of the information regarding the target region. Based on the first information acquired by the acquisition means, the generation means generates second information (for example, this corresponds to future information in the first embodiment) corresponding to the second time, which is later than the first time being the time corresponding to the first information.
In this manner, based on the first information that is part of the information regarding the target region, the information processing device according to the present disclosure generates the second information corresponding to the second time which is later than the first time which is the time corresponding to the first information, making it possible to efficiently generate information.
The acquisition means acquires first information, which is part of the information regarding the target region detected by the sensor. Based on the first information, the generation means generates the second information predicted to be detected by the sensor in the second time. In this manner, based on the first information detected by the sensor, the information processing device can generate the second information predicted to be detected by the sensor at the second time, making it possible to efficiently generate information regarding the information predicted to be detected by the sensor.
In addition, the acquisition means acquires the first information that is information of a portion necessary for processing by the generation means among the information regarding the target region. In this manner, the information processing device can efficiently generate information by using only information of a portion necessary for processing by the generation means.
The acquisition means acquires first information, which is information regarding a portion including a change in the first time among the information regarding the target region. In this manner, the information processing device can efficiently generate information by using only the information of the portion having a change during the first time.
In addition, the acquisition means acquires the first information, which is sparse with respect to the information regarding the target region. In this manner, the information processing device can efficiently generate information by using the first information which is sparse with respect to the information regarding the target region.
In addition, the acquisition means acquires the first information corresponding to a plurality of portions discretely located in the target region. In this manner, the information processing device can efficiently generate information by using the first information corresponding to the plurality of portions discretely located in the target region.
In addition, the acquisition means acquires the first information corresponding to a portion of a predetermined ratio or less in the target region. In this manner, the information processing device can efficiently generate information by using the first information corresponding to the portion of the predetermined ratio or less in the target region.
The generation means generates second information that is encoded data corresponding to the second time. In this manner, by generating the encoded data corresponding to the second time, the information processing device can efficiently generate the information used in the signal output means or the like.
The generation means generates the second information based on the first information and the third information (for example, this corresponds to the past information in the first embodiment) corresponding to the third time which is the time earlier than the first time. In this manner, the information processing device generates the second information based on the third information corresponding to the third time which is a time earlier than the first time so as to perform feedback of the information generated in the past, making it possible to perform information generation with high accuracy.
The generation means generates the second information by using the third information generated earlier than the generation of the second information. In this manner, by generating the second information by using the third information generated earlier than the generation of the second information so as to perform feedback of the information generated in the past, making it possible to perform information generation with higher accuracy.
The generation means generates the second information by using the learning model updated by using the third information. In this manner, the information processing device generates the second information by using the learning model updated using the third information, making it possible to perform information generation with higher accuracy.
In addition, the generation means generates the second information by using the learning model updated based on the first information. In this manner, the information processing device generates the second information by using the learning model updated based on the first information, making it possible to perform information generation with higher accuracy.
The generation means generates the second information by using the learning model updated by using the first information. In this manner, the information processing device generates the second information by using the learning model updated using the first information, making it possible to perform information generation with higher accuracy.
Furthermore, the information processing device includes a combining means (the combining means 106 in the second embodiment) that combines the second information with the other information to generate combined information to be used for output. In this manner, by generating the combined information to be used for output by combining the second information with the other information, the information processing device can appropriately generate the information to be used for output.
In addition, using the first information, the generation means (the current information generation means 105 in the second embodiment) generates the fourth information (for example, this corresponds to current Encode data in the second embodiment) used for combining with the second information. The combining means combines the second information and the fourth information to generate combined information. In this manner, the information processing device can appropriately generate information to be used for output by combining the fourth information generated using the first information with the second information.
The generation means generates fourth information that is encoded data corresponding to the first time. In this manner, the information processing device generates the combined information using the encoded data corresponding to the first time, making it possible to appropriately generate the information to be used for output.
The acquisition means acquires the first information corresponding to a part of the information regarding the target region which is an image. The generation means generates second information corresponding to the image at the second time based on the first information. In this manner, the information processing device can generate the second information corresponding to the image at the second time based on the first information corresponding to a part of the image, making it possible to efficiently generate information regarding the image.
The acquisition means acquires first information corresponding to a pixel having a change in the image. The generation means generates second information corresponding to the image at the second time based on the first information. In this manner, the information processing device can generate the second information corresponding to the image at the second time based on the first information corresponding to the pixel having a change in the image, making it possible to efficiently generate information by using only the information of the pixel having a change.
The information processing device 100 or the like according to each of the above-described embodiments is implemented by a computer 1000 having a configuration as depicted in
The CPU 1100 operates based on a program stored in the ROM 1300 or the HDD 1400 so as to control each of components. For example, the CPU 1100 develops the program stored in the ROM 1300 or the HDD 1400 into the RAM 1200 and executes processing corresponding to various programs.
The ROM 1300 stores a boot program such as a basic input output system (BIOS) executed by the CPU 1100 when the computer 1000 starts up, a program dependent on hardware of the computer 1000, or the like.
The HDD 1400 is a non-transitory computer-readable recording medium that records a program executed by the CPU 1100, data used by the program, or the like. Specifically, the HDD 1400 is a recording medium that records an information processing program according to the present disclosure, which is an example of program data 1450.
The communication interface 1500 is an interface for connecting the computer 1000 to an external network 1550 (for example, the Internet). For example, the CPU 1100 receives data from other devices or transmits data generated by the CPU 1100 to other devices via the communication interface 1500.
The input/output interface 1600 is an interface for connecting an input/output device 1650 with the computer 1000. For example, the CPU 1100 receives data from an input device such as a keyboard or a mouse via the input/output interface 1600. In addition, the CPU 1100 transmits data to an output device such as a display, a speaker, or a printer via the input/output interface 1600. Furthermore, the input/output interface 1600 may function as a media interface for reading a program or the like recorded on predetermined recording medium (or simply medium). Examples of the media include optical recording media such as a digital versatile disc (DVD) or a phase change rewritable disk (PD), a magneto-optical recording medium such as a magneto-optical disk (MO), a tape medium, a magnetic recording medium, and semiconductor memory.
For example, when the computer 1000 functions as the information processing device 100 according to the embodiment, the CPU 1100 of the computer 1000 executes the information processing program loaded on the RAM 1200 so as to implement the functions of the computation means 110 and the like. Furthermore, the HDD 1400 stores the information processing program according to the present disclosure or data in the storage section. While the CPU 1100 executes program data 1450 read from the HDD 1400, the CPU 1100 may acquire these programs from another device via the external network 1550, as another example.
The information processing device according to the first to seventh embodiments described above is merely an example. The information processing device is not limited to the configurations depicted in the first to seventh embodiments, and any configuration can be adopted. For example, the information processing device is not limited to the recurrent (feedback) configuration, and may have a feed-forward type configuration. This point will be exemplified below.
First, an outline of a configuration and processing of an information processing device according to an eighth embodiment will be described with reference to
Hereinafter, the configuration of an information processing device 100F, information used for processing, and the like will be described with reference to
An image group IG1 in
In addition, event data E1 denoted as “Event” in
In
Note that the above is merely an example, and for example, the input may each come from a frame sensor and an event sensor that can perform a global shutter operation and are mounted adjacent to each other with the influence of parallax eliminated as much as possible. Also, the number of sensors may be any number, for example, there may be two frame sensors and there may be two event sensors.
An image event conversion I2E denoted as “Image2Event” in
Furthermore, the event image conversion E2I denoted as “Event2Image” in
The format conversion FC indicated as “Format Conv” in
A loss function L1 and a loss function L2 expressed as “Loss” in
The information processing device 100F trains the model by the above-described configuration and information. For example, the information processing device 100F trains the image event conversion I2E and the event image conversion E2I. First, the information processing device 100F trains the image event conversion I2E. For example, the information processing device 100F fixes the event image conversion E2I and trains the image event conversion I2E.
Then, the information processing device 100F fixes the image event conversion I2E and trains the event image conversion E21. In
With this operation, the information processing device 100F generates the event image conversion E2I that generates a future image from the acquired latest frame image and the event information slightly later than the frame. As described above, the information processing device 100F uses a known future image as a ground truth at the time of training, compares the generated image with the ground truth image in the training processing.
Hereinafter, processing at the time of inference in the information processing device 100F will be described with reference to
Furthermore, in the inference processing depicted in
For example, the information processing device 100F uses the image IMt corresponding to time point t and the output of the format conversion FC corresponding to time point t to generate future information (image) corresponding to the future time point t+nT. For example, the information processing device 100F inputs the image IMt corresponding to time point t and the output of the format conversion FC to which the event data E1 before time point t has been input to the event image conversion E2I, and causes the event image conversion E2I to output an image. With this operation, the information processing device 100F generates the image output by the event image conversion E2I as the future information (image). Note that the above-described processing is merely an example, and the information processing device 100F may generate the future information using various types of information.
Next, an outline of a configuration and processing of an information processing device according to a ninth embodiment will be described with reference to
Hereinafter, the configuration of an information processing device 100G, information used for processing, and the like will be described with reference to
Furthermore, an event flow conversion E2F expressed as “Event2Flow” in
Furthermore, image flow conversion I2F described as “Image2Flow” in
For example, the conversion processing into the flow by the image flow conversion I2F can be implemented by adopting any method (such as a standard method described below). For example, the conversion processing into the flow by the image flow conversion 12F may use a method related to an optical flow. For example, the conversion processing into the flow by the image flow conversion I2F may use a method related to the Flownet 2.0.
For example, the Flownet 2.0 is disclosed in the following document. Note that the Flownet 2.0 is merely an example, and any method such as the Lucas-Kanade method can be adopted as the conversion processing to the flow by the image flow conversion I2F without being limited to the Flownet 2.0.
Furthermore, flow image conversion F2I denoted as “Flow2Image” in
Furthermore, an image blender IB denoted as “ImageBlender” in
For example, the image blender IB uses the first image of the event image conversion E2I and the second image of the flow image conversion F2I as an input, and generates an image (also referred to as a “third image”) corresponding to the future time point corresponding to the input. For example, the image blender IB may generate the third image by calculating a weighted average of the first image of the event image conversion E2I and the second image of the flow image conversion F2I. For example, the image blender IB may generate the third image by calculating a weighted average of the first image and the second image using a first weighting factor corresponding to the first image of the event image conversion E21 and a second weighting factor corresponding to the second image of the flow image conversion F2I.
Note that the above-described processing is merely an example, and the image blender IB may generate the third image by various methods. For example, the image blender IB may generate the third image by selecting an image to be used as future information from among the first image and the second image. For example, in a case where the second image is selected as the future information from among the first image and the second image, the image blender IB uses the second image as the third image. For example, the configuration corresponding to a region AR1 corresponds to the computation section and the data combining section described above. The configuration corresponding to the region ARI can adopt any configuration such as one selected by the SoftMAX function, but this point will be described below.
In addition, a loss function L3 expressed as “Loss” in
The information processing device 100G trains the model by the above-described configuration and information. For example, the information processing device 100G trains configurations corresponding to the image event conversion I2E and the region AR1. After training the image event conversion I2E, the information processing device 100G trains the configuration corresponding to the region AR1. For example, in a case where there is no test image for the image event conversion I2E, the information processing device 100G may provide self-supervised learning.
Here, at the time of training, it is desirable that the event information is framed to some extent before applying a light-weighted configuration to be described below. There are various methods of framing event information, but it is desirable to use technologies such as Voxel Grid and Event Spike Tensor capable of minimizing loss of information.
Voxel Grid is disclosed in the following document, for example. For example, Voxel Grid divides the number of events between frames, and adds up the divided events for each event in a frame shape.
Furthermore, Event Spike Tensor is disclosed in the following document, for example. For example, Event
Spike Tensor divides the time between frames, and adds up the events input within the time in a frame shape for each polarity.
In addition, an object recognition image and training data using an event sensor as an input are overwhelmingly smaller in volume than the case of a frame-based sensor, leading to a requirement of a mechanism capable of performing training even with a small number of images. Therefore, the information processing device 100G introduces a self-supervised learning mechanism.
For example, the information processing device 100G sets up a network that generates an event signal from a signal acquired on a frame basis, and generates an intermediate image so as to enable generation equivalent to an expression of Voxel Grid. In this case, the information processing device 100G defines the loss function L1 such that the generated signal corresponding to the event and the signal of the event after the actual format conversion converge, and trains the network of the image event conversion I2E.
Furthermore, a unit time T during the learning may be a relatively large value with respect to T in real time. For example, 16 frames during the learning may be regarded as an image of one frame at the time of execution, and may be defined as equivalent to a frame signal having a speed equal in real time. In this case, the frame image is a signal obtained by adding signals of 16 frames. For example, when the learning is performed by regarding 16 frames as one frame, the difference itself generated in the time of one frame unit can be easily calculated as an event generated in 1/16 of one frame in Voxel Grid or Event Spike Tensor. This leads to a light-weighted network during the learning, making it possible to perform, for example, the learning with a lightweight error function (L1 norm or L2 norm). Furthermore, in some cases, it is not necessary to specify an error function during the learning.
Then, the information processing device 100G generates the event flow conversion E2F corresponding to the optical flow in order to generate a future image. For example, an optical flow is used to estimate a movement destination when a pixel value having a certain luminance moves to another pixel. The information processing device 100G generates an optical flow for future prediction from an event and learns a teacher (data) from a frame difference.
In a case where the number of images is small in the optical flow, the information processing device 100G trains the network by self-supervised learning. For example, the information processing device 100G generates a signal of an optical flow from a difference between frame signals (for example, FlowNet 2.0 or the like). The information processing device 100G compares the generated flow signal (via the image flow conversion I2F) with the flow signal (via the event flow conversion E2F) generated by the event signal (for example, in the form of Voxel Grid) subjected to the format conversion so as to train a flow generation network. In this case, the loss function L3 is generated such that the event flow conversion E2F is the same as the image flow conversion I2F. Note that the image flow conversion I2F need no training and is defined as a conversion function.
Then, the information processing device 100G generates an image. For example, the information processing device 100G generates at least two candidate images, specifically, image generation (destination of the pixel value) from the flow and image generation (variation of the pixel value) from the event. As described above, the information processing device 100G performs processing of selecting from at least two or more images or blending two images taking the best of the both.
The first image candidate (first image) is generated by calculating the luminance value of the pixel from luminance change information of a single pixel itself. For example, the information processing device 100G uses the first image indicating the luminance change information for a sudden signal change from the outside of the angle of view having no information on the flow source.
Furthermore, the second image candidate (second image) is generated by estimating that a specific pixel in the most recent frame image has moved to another pixel based on the most recent frame image and the flow signal. The second image is generated by the flow image conversion F2I.
The first image and the second image are adjusted by the image blender IB, and several images are output as future images. At this time, the information processing device 100G executes training processing so that the network generates the future image by performing the training defining a signal of n frames ahead as a true value. For example, a signal of n frames ahead is determined by a delay amount until image display of an assumed system. In this case, in a case where n frame ahead is used in the training at a speed of 16 times, for example, defining 16 frame as equivalent to one frame, adjustment is made with 16n frame ahead as a true value. In this manner, the value of the parameter n may be determined based on the delay amount or the like.
As described above, the parameter n is determined by the system, but there is a possibility that the frame image is not normally generated due to an increase in computation volume or an increase in error in the far future (that is, n is large). Therefore, when the computation volume increases by a certain degree or more, the information processing device 100G may perform actions such as generation of an internal state variable such as a Busy flag and may interrupt the computation in the middle. In this manner, in a case where the computation is interrupted in the middle, the information processing device 100G may perform processing such as replacement of an image with an image (for example, a gray blurred image) or the like which is unlikely to invoke uncomfortable feeling. In this manner, it is allowable to provide a determination path of giving up (interrupting processing) when a certain internal computation time is exceeded.
Hereinafter, processing at the time of inference in the information processing device 100G will be described with reference to
Furthermore, in the inference processing depicted in
For example, the information processing device 100G generates future information (image) corresponding to the future time point t+nT by using the image IMt corresponding to time point t and the output of the format conversion FC corresponding to time point t. The image IMt corresponding to time point t, together with the output of the format conversion FC to which the event data E1 before time point t has been input, are input to the event image conversion E2I. The output of the format conversion FC to which the event data E1 before time point t has been input is input to the event flow conversion E2F. The image IMt corresponding to time point t and the output of the event flow conversion E2F are input to the flow image conversion F2I. The first image that is the output of the event image conversion E2I and the second image that is the output of the flow image conversion F2I are input to the image blender IB. With this configuration, the information processing device 100G generates the image output by the image blender IB as future information (image). Note that the above-described processing is merely an example, and the information processing device 100G may generate the future information using various types of information.
As described above, using the network trained by the training processing, the information processing device 100G generates one or more future prediction images from one latest frame image and event information at the time of inference. Note that the image finally selected and generated is determined by the delay amount of the system. For example, in a case where the system includes a plurality of display devices, the information processing device 100G may transmit an optimum generation signal to each display device.
Furthermore, at the time of inference, the network is fixed in the training, and thus, some networks used only at the time of training need not necessarily be used at the time of inference. For example, although the image event conversion I2E generates an event signal from frame information, it is sufficient to directly pass the event signal to the subsequent stage, and thus the output of the Voxel Grid may be directly input to the event flow conversion E2F. Furthermore, the Flow may be generated in a pre-trained network and combined with the frame signal for generation of the flow image conversion F2I. In addition, the event image conversion E2I may also be generated from the frame signal and the event signal. With these operations, an optimum image is finally generated according to the system. For example, in a case where the entire system (system) is trained on a time axis of 1/16 at the time of training, the generation of the Voxel Grid may be performed by using the event frame generated every time of 1/16 of the frame as the Voxel Grid as it is.
Note that, in a case where the latest information is the current state of each image at the time of inference, the event information may correspond to the current state. In this case, since the event is newer, frame information of the newest frame may be regarded as corresponding to the past. Furthermore, the future corresponds to an image to be generated.
Here, an example of miniaturization of the configuration will be described. For example, the configuration corresponding to the region AR1 depicted in
Furthermore, in order to achieve the light-weighted circuit described above, the event signal may be directly input to the network, and framing may be simultaneously performed inside the network. In this case, for example, by applying a spiking neural network (SNN) circuit to a main part close to the input of the network such as immediately after the input, further light-weighted circuit can be achieved.
Note that each configuration described above is merely an example, and the accuracy may be improved by performing the training using a Generative Adversarial Network (GAN). In this regard, an outline of a configuration and processing will be described as an information processing device according to a tenth embodiment with reference to
An information processing device 100H according to the tenth embodiment has a different in that the loss function L2 in the information processing device 100G according to the ninth embodiment has a configuration related to GAN. Therefore, the configuration related to GAN will be mainly described below, and the description similar to those of the information processing device 100G will be omitted as appropriate.
As depicted in
In the information processing device 100H, five configurations, namely, the image event conversion I2E, the image flow conversion I2F, the event image conversion E2I, the event flow conversion E2F, and the flow image conversion F2I, are set (used) as generators (generating devices). Furthermore, the information processing device 100H calculates the loss function La using the discriminator DS that compares the image with the actual ground truth image.
In order to converge the learning, the information processing device 100H uses a difference (hereinafter also referred to as “first difference”) between the original image at t+nT (I1 (x) in
In this manner, in the information processing device 100H, the discriminator DS compares the generated image with the corresponding image of n frames ahead, and distinguishes whether or not there is no sense of discomfort. In this case, the loss function is determined based on the loss function La. Furthermore, in the information processing device 100H, the image generated as the image of the n-th frame ahead is compared with the signal of the current frame of the n-th frame ahead assumed to be equivalent at the input, and confirmed by the loss function Lc in order to compare with the image of the n-th frame ahead.
Note that, in the above-described configuration, it is allowable, at the time of inference, to extract only a portion where there is a change (a region where an event has occurred), and allowable to perform inference based on the extracted information. In this regard, an outline of a configuration and processing of an information processing device according to an eleventh embodiment will be described with reference to
Hereinafter, the configuration of the information processing device 100G, information used for processing, and the like will be described, and then processing at the time of learning will be described. An information processing device 100I according to the eleventh embodiment is different from the information processing device 100G according to the ninth embodiment in having a configuration of extracting information of a region where an event has occurred. Therefore, points related to the flow will be mainly described below, and description of the point similar to those of the information processing device 100G will be omitted as appropriate.
As depicted in
Furthermore, in the inference processing depicted in
For example, as indicated in future information DT3, the information processing device 100I generates future information (image) corresponding to the future time point t+nT by using the image IMt corresponding to time point t and the output of the format conversion FC corresponding to time point t. The image IMt corresponding to time point t, together with the output of the region extraction SG to which the event data E1 before time point t has been input, are input to the cropping CR. The output from the cropping CR, together with the output of the format conversion FC to which the output of the region extraction SG has been input, are input to the event image conversion E21. The output of the format conversion FC to which the output of the region extraction SG has been input is input to the event flow conversion E2F. The output from the cropping CR and the output of the event flow conversion E2F are input to the flow image conversion F2I. The first image that is the output of the event image conversion E2I and the second image that is the output of the flow image conversion F2I are input to the image blender IB. With this configuration, the information processing device 100I generates the image output by the image blender IB as future information (image). Note that the above-described processing is merely an example, and the information processing device 100I may generate the future information using various types of information.
In the example described above, it is assumed that only a part of the angle of view changes when the future image is generated. In this case, since the target of the computation region is narrowed down to a part of the region, the computation volume can be reduced by designating (extracting) the region to be passed to the generation circuit in advance. For example, in the information processing device 100I, a region is divided before computation is performed using a region having a high degree of activity generated from an event signal as area information, as a region-of-interest (ROI). For example, the ROI may be used to recognize a region of the object so as to achieve dynamic tracking.
Note that each configuration described above is merely an example, and the network may be updated at the time of inference. In this regard, an outline of a configuration and processing of an information processing device according to a twelfth embodiment will be described with reference to
An image group IG2 in
In
In this manner, the information processing device 100J uses, for example, a light-weighted network is used at the time of inference, and updates the network by comparing the image with an actual ground truth image at a certain frequency even at the time of inference. Note that it is desirable that the conceptual image (RGB image) is input to the computation device only at the beginning of use of the device, and the remaining future prediction is processed by the future prediction based on Event data. For example, the information processing device 100J may compare the generated image predicted at a certain time point with a frame image corresponding to the predicted generated image at a certain timing and may update the coefficient (parameter) for prediction. In this case, the information processing device 100J updates the event image conversion E2I, the flow image conversion F2I, and the like.
Note that the network update at the time of the inference described above may also be performed at the inference based on the extracted information in the eleventh embodiment. In this regard, an outline of a configuration and processing of an information processing device according to the thirteenth embodiment will be described with reference to
For example, an information processing device 100K according to the thirteenth embodiment is a combination of the information processing device 100I according to the eleventh embodiment and the information processing device 100J according to the twelfth embodiment. In other words, the information processing device 100K applies the network update at the time of inference described with the information processing device 100J according to the twelfth embodiment to the information processing device 100I according to the eleventh embodiment using only the information of the region where the event has occurred. Therefore, the configuration and processing of the information processing device 100K will be described in detail below.
Note that the information processing device may use at least two or more configurations (systems) that perform future prediction. For example, the information processing device may perform two future predictions of a future prediction, namely, a future prediction for a person's (user's) left eye (also simply referred to as “left eye”) and a future prediction for a person's (user's) right eye (also simply referred to as “right eye”), and may calculate their parallax so as to perform distance measurement. In this regard, an outline of a configuration and processing of an information processing device according to a fourteenth embodiment will be described with reference to
An information processing device 100L according to the fourteenth embodiment has a first group configuration G1 and a second group configuration G2. For example, the first group configuration G1 is used to generate future information DT61 which is future information for the right eye (also referred to as “first future information”). In
The event image conversion E2I1 has a function similar to the function of the event image conversion E2I in
Furthermore, in the information processing device 100L, for example, the second group configuration G2 is used to generate future information DT62 which is future information for the left eye (also referred to as “second future information”). In
The event image conversion E2I2 has a function similar to the function of the event image conversion E2I in
Furthermore, the information processing device 100L uses the first future information generated by the processing of the first group configuration G1 and the second future information generated by the processing of the second group configuration G2 to generate future parallax information (also referred to as “third future information”). In
The information processing device 100L generates information including distance measurement information using parallax in a plurality of (two in
Note that the present technique can also have the following configurations.
(1)
An information processing device comprising:
(2)
The information processing device according to (1),
(3)
The information processing device according to (1) or (2),
(4)
The information processing device according to any one of (1) to (3),
(5)
The information processing device according to any one of (1) to (4),
(6)
The information processing device according to any one of (1) to (5),
(7)
The information processing device according to (5),
(8)
The information processing device according to any one of (1) to (7),
(9)
The information processing device according to any one of (1) to (8),
(10)
The information processing device according to (9),
(11)
The information processing device according to (9) or (10),
(12)
The information processing device according to (11),
(13)
The information processing device according to (11) or (12),
(14)
The information processing device according to any one of (1) to (13), further comprising a combining means that combines the second information with another information to generate combined information to be used for output.
(15)
The information processing device according to (14),
(16)
The information processing device according to (15),
(17)
The information processing device according to any one of (1) to (16),
(18)
The information processing device according to (17),
(19)
An information processing method comprising:
(20)
An information processing program that causes processing to be executed, the processing comprising:
Number | Date | Country | Kind |
---|---|---|---|
2021-098261 | Jun 2021 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2022/012525 | 3/18/2022 | WO |