This application relates to the field of computer vision, and in particular, to an optical flow estimation method and apparatus.
In an optical flow (optical flow) method, a correspondence between a previous frame and a current frame is found based on a change of a pixel in an image sequence in time domain and correlation between adjacent frames, to calculate motion information of an object between the adjacent frames.
A conventional optical flow estimation method can estimate only optical flows between a first image frame and a second image frame (including an optical flow from the first image frame to the second image frame and an optical flow from the second image frame to the first image frame). The first image frame and the second image frame are two adjacent image frames. However, optical flows from the two image frames to any moment between the two image frames (including an optical flow from the first image frame to the any moment and an optical flow from the second image frame to the any moment) can be evenly allocated only through a linear motion assumption and based on a time length as a weight.
Because linear motion is an assumed case, and is greatly different from an actual case, the accuracy is low when the optical flows from the two adjacent image frames to the any moment between the two adjacent image frames are estimated by using a conventional optical flow calculation method.
This application provides an optical flow estimation method and apparatus, to improve accuracy of estimating optical flows from two adjacent image frames to any moment between the two adjacent image frames.
According to a first aspect, this application provides an optical estimation method. The method may include: obtaining a first image frame and a second image frame, where the first image frame and the second image frame are any two adjacent image frames in an image sequence, and the image sequence is obtained by photographing a target scene; obtaining a first event frame, where the first event frame is used to describe a luminance change of the target scene within a time period from the first image frame to the second image frame; and determining a target optical flow based on the first image frame, the second image frame, and the first event frame, where the target optical flow is an optical flow from the first image frame to a target moment, and the target moment is any moment between the first image frame and the second image frame.
In a possible implementation, the method may be used in an optical flow estimation system. The optical flow estimation system may include a pixel sensor, an event-based sensor, and an optical flow estimation apparatus. The pixel sensor and the event-based sensor are separately connected to the optical flow estimation apparatus. The method may be performed, for example, by the optical flow estimation apparatus.
It should be further noted that, because the image sequence is obtained by photographing the target scene by using a pixel camera, the first image frame and the second image frame include pixel information of a target object in the target scene. Because event flow data is obtained by photographing the target scene by using an event-based camera, the first event flow data may capture real high-speed motion information (including linear motion and non-linear motion) of the target object in the target scene within the time period between the first image frame and the second image frame.
According to the optical flow estimation method provided in this application, the optical flow estimation apparatus first estimates a first optical flow between the first image frame and the second image frame based on the first image frame, the second image frame, and the first event frame, and then determines, based on a second event frame, a weight (namely, a first optical flow allocation mask) of the optical flow of the first image frame and the target moment relative to an optical flow of the first image frame and the second image frame. Because there is no motion assumption of the target object, the obtained first optical flow allocation mask has a feature of accurately allocating an optical flow in real motion. Therefore, accuracy of the target optical flow obtained by weighting the first optical flow by using the first optical flow allocation mask is high.
Optionally, the optical flow estimation apparatus may obtain the first image frame and the second image frame in a plurality of manners. This is not limited in this application.
In a possible implementation, the optical flow estimation apparatus may receive the first image frame and the second image frame that are sent by the pixel camera.
In another possible implementation, the optical flow estimation apparatus may obtain, through an input interface, the first image frame and the second image frame that are of another device or that are used for input.
Optionally, the target scene may include at least one target object, and some or all objects in the at least one target object are in a motion state.
Optionally, the optical flow estimation apparatus may obtain the first event frame in a plurality of manners. This is not limited in this application.
In a possible implementation, the optical flow estimation apparatus may receive event flow data sent by the event-based camera, where the event flow data includes event data of each event in at least one event, the at least one event one-to-one corresponds to at least one luminance change that occurs in the target scene between the first image frame and the second image frame, and the data of each event includes a timestamp, pixel coordinates, and a polarity; and the optical flow estimation apparatus may obtain the first event frame based on the event flow data. In other words, the event-based camera may collect the event flow data, and send the event flow data to the optical flow estimation apparatus.
In another possible implementation, the optical flow estimation apparatus may receive the first event frame sent by the event-based camera. In other words, the event-based camera may collect the event flow data, generate the first event frame based on the event flow data, and send the first event frame to the optical flow estimation apparatus.
It should be noted that the first image frame, the second image frame, and the first event frame have same resolution.
In a possible implementation, for example, the first image frame includes H×W pixels, both H and W are integers greater than 1, the first event frame may include a plurality of channels, and the plurality of channels may include a first channel, a second channel, a third channel, and a fourth channel. The first channel includes H×W first values, where the H×W first values one-to-one correspond to locations of the H×W pixels, and the first value indicates a quantity of times that luminance of a pixel at a corresponding location in the first image frame increases within the time period from the first image frame to the second image frame; the second channel includes H×W second values, where the H×W second values one-to-one correspond to the locations of the H×W pixels, and the second value indicates a quantity of times that luminance of a pixel at a corresponding location in the first image frame decreases within the time period from the first image frame to the second image frame; the third channel includes H×W third values, where the H×W third values one-to-one correspond to the locations of the H×W pixels, and the third value indicates time at which luminance of a pixel at a corresponding location in the first image frame increases for the last time within the time period from the first image frame to the second image frame; and the fourth channel includes H×W fourth values, where the H×W fourth values one-to-one correspond to the locations of the H×W pixels, and the fourth value indicates time at which luminance of a pixel at a corresponding location in the first image frame decreases for the last time within the time period from the first image frame to the second image frame.
In a possible implementation, before the optical flow estimation apparatus determines the target optical flow based on the first image frame, the second image frame, and the first event frame, the optical flow estimation apparatus may obtain a second event frame, where the second event frame is used to describe a luminance change of the target scene within a time period from the first image frame to the target moment; and that the optical flow estimation apparatus determines a target optical flow based on the first image frame, the second image frame, and the first event frame includes: The optical flow estimation apparatus determines the target optical flow based on the first image frame, the second image frame, the first event frame, and the second event frame.
Specifically, the optical flow estimation apparatus may determine a first optical flow based on the first image frame, the second image frame, and the first event frame, where the first optical flow is an optical flow from the first image frame to the second image frame; determine a first optical flow allocation mask based on the second event frame, where the first optical flow allocation mask indicates a weight of the target optical flow relative to the first optical flow; and determine the target optical flow based on the first optical flow and the first optical flow allocation mask.
Optionally, the first optical flow may be a sparse optical flow, or may be a dense optical flow. This is not limited in this application.
In a possible implementation, for example, the first optical flow is the dense optical flow. The first optical flow indicates different motion directions by using different colors of pixels, and indicates different motion rates by using different luminance of the pixels.
Optionally, the optical flow estimation apparatus may determine the first optical flow based on the first image frame, the second image frame, and the first event frame in a plurality of manners. This is not limited in this application.
In a possible implementation, the optical flow estimation apparatus may input the first image frame, the second image frame, and the first event frame to a preset optical flow estimation model, to obtain the first optical flow.
To meet device-side deployment and real-time performance requirements, a network structure of the optical flow estimation model cannot be too complex. In addition, because optical flow estimation is a complex task, it is difficult for a network with a simple structure to complete high-precision optical flow estimation. Therefore, in this application, the network structure of the optical flow estimation model is obtained by training a lightweight first convolutional neural network. The first convolutional neural network may include several processing layers: dimension reduction, convolution, residual, deconvolution, and dimension increasing. Cyclic iteration is performed on the first convolutional neural network, to improve accuracy of the optical flow estimation, and to facilitate deployment of the optical flow estimation model on the device side.
Optionally, the optical flow estimation apparatus may input the first image frame, the second image frame, and the first event frame to the optical flow estimation model, and perform cyclic iteration, to obtain the first optical flow. In other words, the optical flow estimation apparatus may input the first image frame, the second image frame, and the first event frame to the optical flow estimation model, to obtain a second optical flow; and input the first image frame, the second image frame, the first event frame, and the second optical flow to the optical flow estimation model, to obtain a third optical flow. By analogy, the cyclic iteration is performed until the optical flow estimation model outputs the first optical flow when a loss function preset in the optical flow estimation model is met when the optical flow estimation model outputs the first optical flow.
It should be noted that the first optical flow allocation mask, the first image frame, the second image frame, and the first event frame have same resolution.
Optionally, the optical flow estimation apparatus may determine the first optical flow allocation mask based on the second event frame in a plurality of manners. This is not limited in this application.
In a possible implementation, the optical flow estimation apparatus may input the second event frame to a preset optical flow allocation model, to obtain the first optical flow allocation mask.
To meet device-side deployment and real-time performance requirements, a network structure of the optical flow allocation model cannot be too complex. Therefore, in this application, the network structure of the optical flow allocation model is obtained by training a lightweight second CNN. The second CNN may include processing layers such as fusion and convolution, and cyclic iteration is performed on the second CNN, to improve accuracy of optical flow estimation, and to facilitate deployment of the optical flow estimation model on the device side.
Optionally, the optical flow estimation apparatus may input the second event frame to the optical flow allocation model, and perform cyclic iteration, to obtain the first optical flow allocation mask.
According to a second aspect, this application further provides an optical flow estimation apparatus, including an obtaining module and an optical flow estimation module. The obtaining module is configured to: obtain a first image frame and a second image frame, where the first image frame and the second image frame are any two adjacent image frames in an image sequence, and the image sequence is obtained by photographing a target scene; and obtain a first event frame, where the first event frame is used to describe a luminance change of the target scene within a time period from the first image frame to the second image frame; and the optical flow estimation module is configured to determine a target optical flow based on the first image frame, the second image frame, and the first event frame, where the target optical flow is an optical flow from the first image frame to a target moment, and the target moment is any moment between the first image frame and the second image frame.
In a possible implementation, the obtaining module is further configured to: before the target optical flow is determined based on the first image frame, the second image frame, and the first event frame, obtain a second event frame, where the second event frame is used to describe a luminance change of the target scene within a time period from the first image frame to the target moment; and the optical flow estimation module is specifically configured to determine the target optical flow based on the first image frame, the second image frame, the first event frame, and the second event frame.
In a possible implementation, the optical flow estimation module includes an inter-frame optical flow estimation submodule, an optical flow allocation submodule, and an inter-frame optical flow estimation submodule at any moment, where the inter-frame optical flow estimation submodule is configured to determine a first optical flow based on the first image frame, the second image frame, and the first event frame, where the first optical flow is an optical flow from the first image frame to the second image frame; the optical flow allocation submodule is configured to determine a first optical flow allocation mask based on the second event frame, where the first optical flow allocation mask indicates a weight of the target optical flow relative to the first optical flow; and the inter-frame optical flow estimation submodule at any moment is configured to determine the target optical flow based on the first optical flow and the first optical flow allocation mask.
In a possible implementation, the inter-frame optical flow estimation submodule is specifically configured to input the first image frame, the second image frame, and the first event frame to a preset optical flow estimation model, to obtain the first optical flow.
In a possible implementation, the inter-frame optical flow estimation submodule is specifically configured to: input the first image frame, the second image frame, and the first event frame to the preset optical flow estimation model; and perform cyclic iteration, to obtain the first optical flow.
In a possible implementation, the optical flow allocation submodule is specifically configured to input the second event frame to a preset optical flow allocation model, to obtain the first optical flow allocation mask.
In a possible implementation, the optical flow allocation submodule is specifically configured to input: the second event frame to the optical flow allocation model, and perform cyclic iteration, to obtain the first optical flow allocation mask.
In a possible implementation, the first image frame includes H×W pixels, both H and W are integers greater than 1, the first event frame includes a plurality of channels, and the plurality of channels include a first channel, a second channel, a third channel, and a fourth channel; the first channel includes H×W first values, where the H×W first values one-to-one correspond to locations of the H×W pixels, and the first value indicates a quantity of times that luminance of a pixel at a corresponding location in the first image frame increases within the time period from the first image frame to the second image frame; the second channel includes H×W second values, where the H×W second values one-to-one correspond to the locations of the H×W pixels, and the second value indicates a quantity of times that luminance of a pixel at a corresponding location in the first image frame decreases within the time period from the first image frame to the second image frame; the third channel includes H×W third values, where the H×W third values one-to-one correspond to the locations of the H×W pixels, and the third value indicates a timestamp at which luminance of a pixel at a corresponding location in the first image frame increases for the last time within the time period from the first image frame to the second image frame; and the fourth channel includes H×W fourth values, where the H×W fourth values one-to-one correspond to the locations of the H×W pixels, and the fourth value indicates a timestamp at which luminance of a pixel at a corresponding location in the first image frame decreases for the last time within the time period from the first image frame to the second image frame.
In a possible implementation, the obtaining module is specifically configured to: obtain event flow data, where the event flow data includes event data of each event in at least one event, the at least one event one-to-one corresponds to at least one luminance change that occurs in the target scene between the first image frame and the second image frame, and the data of each event includes a timestamp, pixel coordinates, and a polarity; and obtain the first event frame based on the event flow data.
According to a third aspect, this application further provides an optical flow estimation apparatus. The optical flow estimation apparatus may include at least one processor and at least one communication interface, where the at least one processor is coupled to the at least one communication interface, the at least one communication interface is configured to provide information and/or data for the at least one processor, and the at least one processor is configured to run computer program instructions to perform the optical flow estimation method according to the first aspect and any one of the possible implementations of the first aspect.
Optionally, the apparatus may be a chip or an integrated circuit.
According to a fourth aspect, this application further provides a terminal. The terminal may include the optical flow estimation apparatus according to the second aspect and any one of the possible implementations of the second aspect, or the optical flow estimation apparatus according to the third aspect.
According to a fifth aspect, this application further provides a computer-readable storage medium, configured to store a computer program, where when the computer program is run by a processor, the optical flow estimation method according to the first aspect and any one of the possible implementations of the first aspect is implemented.
According to a sixth aspect, this application further provides a computer program product, where when the computer program product runs on a processor, the optical flow estimation method according to the first aspect and any one of the possible implementations of the first aspect is implemented.
The optical flow estimation apparatus, the computer storage medium, the computer program product, the chip, and the terminal provided in this application are all configured to perform the optical flow estimation method provided above. Therefore, for beneficial effect that can be achieved by the optical flow estimation apparatus, the computer storage medium, the computer program product, the chip, and the terminal, refer to beneficial effect of the optical flow estimation method provided above. Details are not described herein again.
The following describes technical solutions in embodiments of this application with reference to accompanying drawings in embodiments of this application.
The pixel camera is a conventional camera that collects a luminance value of a scene at a fixed rate (namely, a frame rate) and outputs the luminance value as image data at a fixed rate.
The event-based camera is a new type of sensor that captures a dynamic change of pixel luminance in a scene based on an event-driven mode.
To some extent, the conventional camera captures static/still space, while the event-based camera aims to sensitively capture an object in motion.
Unlike the conventional camera, the event-based camera only observes “motion” in the scene, or “a luminance change” in the scene, to be exact. The event-based camera outputs a luminance change (1 or 0) of a corresponding pixel only when the luminance changes. The event-based camera has advantages such as fast response and a wide dynamic range.
The event-based camera outputs a single pixel only when light intensity changes. For example, if the luminance increases and exceeds a threshold, a corresponding pixel outputs a luminance increase event. The event-based camera does not have a concept of frame, and when the scene changes, the event-based camera produces a series of pixel-level outputs. Because theoretical time resolution of the event-based camera is up to 1 μs, a delay is very low, lower than a motion rate in most common scenes. Therefore, there is no motion blur problem. In addition, each pixel of the event-based camera works independently and asynchronously. Therefore, the dynamic range is large. The event-based camera also has an advantage of low energy consumption.
To sum up, the conventional camera photographs the scene in full frame at the fixed frame rate, and all pixels work synchronously. For the event-based camera, each pixel works independently and asynchronously. A sampling rate is up to one million hertz (Hz) and outputs only the luminance change (namely, the event). An event is described based on a four-tuple event data, and event data output by all pixels is summarized, to form an event list including events that is used as event flow data output by the camera.
For example, event data of an event may be indicated as (x, y, t, p), where (x, y) is pixel coordinates of the event, t is a moment at which the time occurs, p is a polarity of the event (For example, p=0 indicates that luminance of the pixel decreases compared with previous sampled luminance, and p=1 indicates that luminance of the pixel increases compared with previous sampled luminance).
A commonly used event-based camera may include a dynamic vision sensor (dynamic vision sensor, DVS) or a dynamic and active-pixel vision sensor (dynamic and active-pixel vision sensor, DAVIS).
For example,
An optical flow is a flow of light, and is a method that finds a correspondence between a previous frame and a current frame based on a change of a pixel in an image sequence in time domain and correlation between adjacent frames, to calculate motion information of an object between the adjacent frames.
Based on whether a sparse point in an image is selected for optical flow estimation, the optical flow can be classified into a sparse optical flow and a dense optical flow.
For example,
In the conventional technology, an optical flow network (FlowNet) is a model that is obtained by training a convolutional neural network and that is used to estimate an optical flow. Optical flow estimation is to estimate a pixel-level optical flow between any two adjacent image frames in an image sequence based on the two image frames.
For example, two adjacent image frames are a first image frame and a second image frame, and the first image frame is a previous frame of the second image frame. An existing optical flow network may be used to estimate a bidirectional optical flow between the first image frame and the second image frame, namely, an optical flow from the first image frame to the second image frame and an optical flow from the second image frame to the first image frame.
Because a conventional camera captures an image at a constant frequency (namely, a frame rate), even if the frame rate can reach 1 kHz, there is a delay of 1 ms. Within the delay of 1 ms, a target object may be moving at a high speed.
Optical flows from the two image frames to any moment between the two image frames, namely, an optical flow from the first image frame to the any moment and an optical flow from the second image frame to the any moment, can be evenly allocated only through a linear motion assumption and based on a time length as a weight.
However, in the scene, the target object may move based on non-linear motion. Therefore, accuracy is low when the optical flows from two adjacent image frames to any moment between the two image frames are estimated by using the existing linear motion assumption method.
First, an optical flow estimation system to which an optical flow estimation method and apparatus provided in this application are applied is described.
The pixel sensor 110 is configured to: photograph a target scene to obtain an image sequence; and send a first image frame and a second video frame to the optical flow estimation apparatus 130, where the first image frame and the second image frame are any two adjacent image frames in the image sequence.
For example, the pixel sensor may be a pixel camera. A model and a type of the pixel camera are not limited in this application.
The event-based sensor 120 is configured to: photograph the target scene, to obtain event flow data, where the event flow data includes event data of each event in at least one event, the at least one event one-to-one corresponds to at least one luminance change that occurs in the target scene between the first image frame and the second image frame, and the data of each event includes a timestamp, pixel coordinates, and a polarity; obtain the first event frame based on the event flow data, where the first event frame is used to describe a luminance change of the target scene within a time period from the first image frame to the second image frame, and the first image frame, the second image frame, and the first event frame have same resolution; and send the first image frame, the second image frame, and the first event frame to the optical flow estimation apparatus 130.
For example, the event-based sensor may be an event-based camera. A model and a type of the event-based camera are not limited in this application.
The optical flow estimation apparatus 130 is configured to determine a target optical flow based on the first image frame, the second image frame, and the first event frame (for a specific method, refer to the optical flow estimation method provided in this application described below), where the target optical flow is an optical flow from the first image frame to a target moment, and the target moment is any moment between the first image frame and the second image frame.
It should be noted that, the foregoing uses only an example in which the optical flow (namely, the target optical flow) from the first image frame to the target moment for description. However, this application is not limited thereto. A method for estimating an optical flow from the second image frame to the target moment is similar to the method for estimating the target optical flow. For details, refer to a target optical flow estimation method provided in this application. Details are not described herein.
Optionally, the event-based sensor 120 may directly send the event flow data to the optical flow estimation apparatus 130. Correspondingly, the optical flow estimation apparatus 130 obtains the first event frame based on the event flow data.
Optionally, the apparatuses may communicate with each other in a wired manner or a wireless manner. This is not limited in this application.
For example, the wired manner may be implementing communication through a data line connection or through an internal bus connection.
For example, the wireless manner may be implementing communication by using a communication network. The communication network may be a local area network, or may be a wide area network transferred by using a relay (relay) device, or may include a local area network and a wide area network. When the communication network is the local area network, the communication network may be a wireless fidelity (wireless fidelity, Wi-Fi) hotspot network or a Wi-Fi peer-to-peer (peer-to-peer, P2P) network, a Bluetooth (Bluetooth) network, a ZigBee network, a near field communication (near field communication, NFC) network, a possible future general short-range communication network, or the like. When the communication network is the wide area network, for example, the communication network may be a 3rd generation mobile communication technology (3rd generation wireless telephone technology, 3G) network, a 4th generation mobile communication technology (4th generation mobile communication technology, 4G) network, a 5th generation mobile communication technology (5th generation mobile communication technology, 5G) network, a public land mobile network (public land mobile network, PLMN), the Internet (Internet), or the like. This is not limited in this application.
According to the optical flow estimation system provided in this application, because the first event frame can capture low-delay motion information of a target in the target scene between the first image frame and the second image frame, the first image frame and the second image frame can capture pixel information of the target scene. Therefore, the target optical flow is determined based on the first event frame, the first image frame, and the second image frame. This can improve accuracy of the target optical flow.
The foregoing describes the optical flow estimation system provided in this application, and the following further describes the optical flow estimation method applied to the optical flow estimation system provided in this application.
S201: Obtain a first image frame and a second image frame, where the first image frame and the second image frame are any two adjacent image frames in an image sequence, and the image sequence is obtained by photographing a target scene.
Optionally, the method 200 may be performed by an optical flow estimation apparatus.
For example, the optical flow estimation apparatus herein may be the optical flow estimation apparatus 130 in the optical flow estimation system 100.
Optionally, the optical flow estimation apparatus may obtain the first image frame and the second image frame in a plurality of manners. This is not limited in this application.
In a possible implementation, the optical flow estimation apparatus may receive the first image frame and the second image frame that are sent by a pixel camera.
For example, the pixel camera herein may be the pixel camera 110 in the optical flow estimation system 100.
In another possible implementation, the optical flow estimation apparatus may obtain, through an input interface, the first image frame and the second image frame that are of another device or that are used for input.
Optionally, the target scene may include at least one target object, and some or all objects in the at least one target object are in a motion state.
S202: Obtain a first event frame, where the first event frame is used to describe a luminance change of the target scene within a time period from the first image frame to the second image frame.
Optionally, the optical flow estimation apparatus may obtain the first event frame in a plurality of manners. This is not limited in this application.
In a possible implementation, the optical flow estimation apparatus may receive event flow data sent by an event-based camera, where the event flow data includes event data of each event in at least one event, the at least one event one-to-one corresponds to at least one luminance change that occurs in the target scene between the first image frame and the second image frame, and the data of each event includes a timestamp, pixel coordinates, and a polarity; and the optical flow estimation apparatus may obtain the first event frame based on the event flow data. In other words, the event-based camera may collect the event flow data, and send the event flow data to the optical flow estimation apparatus.
In another possible implementation, the optical flow estimation apparatus may receive the first event frame sent by the event-based camera. In other words, the event-based camera may collect the event flow data, generate the first event frame based on the event flow data, and send the first event frame to the optical flow estimation apparatus.
For example, the event-based camera herein may be the event-based camera 120 in the optical flow estimation system 100.
It should be noted that the first image frame, the second image frame, and the first event frame have same resolution.
For example, the resolution of the first image frame, the second image frame, and the first event frame is all 4×4.
In a possible implementation, for example, the first image frame includes H×W pixels, both H and W are integers greater than 1, the first event frame may include a plurality of channels, and the plurality of channels may include a first channel, a second channel, a third channel, and a fourth channel. The first channel includes H×W first values, where the H×W first values one-to-one correspond to locations of the H×W pixels, and the first value indicates a quantity of times that luminance of a pixel at a corresponding location in the first image frame increases within the time period from the first image frame to the second image frame; the second channel includes H×W second values, where the H×W second values one-to-one correspond to the locations of the H×W pixels, and the second value indicates a quantity of times that luminance of a pixel at a corresponding location in the first image frame decreases within the time period from the first image frame to the second image frame; the third channel includes H×W third values, where the H×W third values one-to-one correspond to the locations of the H×W pixels, and the third value indicates time at which luminance of a pixel at a corresponding location in the first image frame increases for the last time within the time period from the first image frame to the second image frame; and the fourth channel includes H×W fourth values, where the H×W fourth values one-to-one correspond to the locations of the H×W pixels, and the fourth value indicates time at which luminance of a pixel at a corresponding location in the first image frame decreases for the last time within the time period from the first image frame to the second image frame.
For example,
Similarly, a pixel whose coordinates are (2, 2) is used as an example. In the event flow data in
S203: Determine a target optical flow based on the first image frame, the second image frame, and the first event frame, where the target optical flow is an optical flow from the first image frame to a target moment, and the target moment is any moment between the first image frame and the second image frame.
It should be further noted that, because the image sequence is obtained by photographing the target scene by using the pixel camera, the first image frame and the second image frame include pixel information of the target object in the target scene. Because the event flow data is obtained by photographing the target scene by using the event-based camera, the first event flow data may capture real high-speed motion information (including linear motion and non-linear motion) of the target object in the target scene within the time period between the first image frame and the second image frame.
In conclusion, estimating the optical flow at the target moment based on the first image frame, the second image frame, and the first event frame can improve accuracy of the target optical flow.
In a possible implementation, before S203, the optical flow estimation apparatus may obtain a second event frame, where the second event frame is used to describe the luminance change of the target scene within the time period from the first image frame to the target moment. Correspondingly, in S203, the optical flow estimation apparatus may determine the target optical flow based on the first image frame, the second image frame, the first event frame, and the second event frame.
It should be noted that, for a manner in which the second event frame is obtained, refer to the foregoing manner in which the first event frame is obtained. Details are not described herein.
Specifically, the optical flow estimation apparatus may determine a first optical flow based on the first image frame, the second image frame, and the first event frame, where the first optical flow is an optical flow from the first image frame to the second image frame; determine a first optical flow allocation mask based on the second event frame, where the first optical flow allocation mask indicates a weight of the target optical flow relative to the first optical flow; and determine the target optical flow based on the first optical flow and the first optical flow allocation mask.
Optionally, the first optical flow may be a sparse optical flow, or may be a dense optical flow. This is not limited in this application.
In a possible implementation, for example, the first optical flow is the dense optical flow. The first optical flow indicates different motion directions by using different colors of pixels, and indicates different motion rates by using different luminance of the pixels.
Optionally, the optical flow estimation apparatus may determine the first optical flow based on the first image frame, the second image frame, and the first event frame in a plurality of manners. This is not limited in this application.
In a possible implementation, the optical flow estimation apparatus may input the first image frame, the second image frame, and the first event frame to a preset optical flow estimation model, to obtain the first optical flow.
To meet device-side deployment and real-time performance requirements, a network structure of the optical flow estimation model cannot be too complex. In addition, because optical flow estimation is a complex task, it is difficult for a network with a simple structure to complete high-precision optical flow estimation. Therefore, in this application, the network structure of the optical flow estimation model is obtained by training a lightweight first convolutional neural network (convolutional neural network, CNN). The first CNN may include several processing layers: dimension reduction, convolution, residual, deconvolution, and dimension increasing. Cyclic iteration is performed on the first CNN, to improve accuracy of the optical flow estimation, and to facilitate deployment of the optical flow estimation model on the device side.
Optionally, the optical flow estimation apparatus may input the first image frame, the second image frame, and the first event frame to the optical flow estimation model, and perform cyclic iteration, to obtain the first optical flow. In other words, the optical flow estimation apparatus may input the first image frame, the second image frame, and the first event frame to the optical flow estimation model, to obtain a second optical flow; and input the first image frame, the second image frame, the first event frame, and the second optical flow to the optical flow estimation model, to obtain a third optical flow. By analogy, the cyclic iteration is performed until the optical flow estimation model outputs the first optical flow when a loss function preset in the optical flow estimation model is met when the optical flow estimation model outputs the first optical flow.
It should be noted that the first optical flow allocation mask, the first image frame, the second image frame, and the first event frame have same resolution.
Optionally, the optical flow estimation apparatus may determine the first optical flow allocation mask based on the second event frame in a plurality of manners. This is not limited in this application.
In a possible implementation, the optical flow estimation apparatus may input the second event frame to a preset optical flow allocation model, to obtain the first optical flow allocation mask.
To meet device-side deployment and real-time performance requirements, a network structure of the optical flow allocation model cannot be too complex. Therefore, in this application, the network structure of the optical flow allocation model is obtained by training a lightweight second CNN. The second CNN may include processing layers such as fusion and convolution, and cyclic iteration is performed on the second CNN, to improve accuracy of optical flow estimation, and to facilitate deployment of the optical flow estimation model on the device side.
For example, the fusion processing layer is used to fuse second event frames of a plurality of channels into an image of one channel. The convolution processing layer is used to separately perform convolution processing on the image of the channel by using a convolution kernel in an X direction and a convolution kernel in a Y direction, and output the first optical flow allocation mask of one channel, where the resolution of the first optical flow allocation mask is the same as that of the second event frame.
For example, the X-direction convolution kernel may be
and the Y-direction convolution kernel may be
Optionally, the optical flow estimation apparatus may input the second event frame to the optical flow allocation model, and perform cyclic iteration, to obtain the first optical flow allocation mask. In other words, the optical flow estimation apparatus may input the second event frame to the optical flow allocation model, to obtain a second optical flow allocation mask; and input the second event frame and the second optical flow allocation mask to the optical flow allocation model, to obtain a third optical flow allocation mask. By analogy, the cyclic iteration is performed until the optical flow allocation model outputs the first optical flow allocation mask when a loss function preset in the optical flow allocation model is met when the optical flow allocation model outputs the first optical flow allocation mask.
In a possible implementation, the optical flow estimation apparatus may weight an optical flow at a corresponding location in the first optical flow by using the first optical flow allocation mask, to obtain the target optical flow.
According to the optical flow estimation method provided in this application, the optical flow estimation apparatus first estimates the first optical flow between the first image frame and the second image frame based on the first image frame, the second image frame, and the first event frame, and then determines, based on the second event frame, a weight (namely, the first optical flow allocation mask) of the optical flow of the first image frame and the target moment relative to the optical flow of the first image frame and the second image frame. Because there is no motion assumption of the target object, the obtained first optical flow allocation mask has a feature of accurately allocating an optical flow in real motion. Therefore, the accuracy of the target optical flow obtained by weighting the first optical flow by using the first optical flow allocation mask is high.
The foregoing describes the optical flow estimation method provided in embodiments of this application with reference to
The obtaining module 301 is configured to: obtain a first image frame and a second image frame, where the first image frame and the second image frame are any two adjacent image frames in an image sequence, and the image sequence is obtained by photographing a target scene; and obtain a first event frame, where the first event frame is used to describe a luminance change of the target scene between the first image frame and the second image frame.
The optical flow estimation module 302 is configured to determine a target optical flow based on the first image frame, the second image frame, and the first event frame, where the target optical flow includes at least one of an optical flow of a corresponding pixel between the first image frame and a target moment and an optical flow of a corresponding pixel between the target moment and the second image frame, and the target moment is any moment between the first image frame and the second image frame.
In a possible implementation, the obtaining module 301 is further configured to: before the target optical flow is determined based on the first image frame, the second image frame, and the first event frame, obtain a second event frame, where the second event frame is used to describe a luminance change of the target scene between the first image frame and the target moment; and the optical flow estimation module 302 is specifically configured to determine the target optical flow based on the first image frame, the second image frame, the first event frame, and the second event frame.
Optionally, the optical flow estimation module 302 may include an inter-frame optical flow estimation submodule 3021, an optical flow allocation submodule 3022, and an inter-frame optical flow estimation submodule at any moment 3023.
In a possible implementation, the inter-frame optical flow estimation submodule 3021 is configured to determine a first optical flow based on the first image frame, the second image frame, and the first event frame, where the first optical flow is an optical flow of a corresponding pixel between the first image frame and the second image frame; the optical flow allocation submodule 3022 is configured to determine a first optical flow allocation mask based on the second event frame, where the first optical flow allocation mask indicates a weight of the target optical flow relative to the first optical flow; and the inter-frame optical flow estimation submodule at any moment 3023 is configured to determine the target optical flow based on the first optical flow and the first optical flow allocation mask.
In a possible implementation, the inter-frame optical flow estimation submodule 3021 is specifically configured to input the first image frame, the second image frame, and the first event frame to a preset optical flow estimation model, to obtain the first optical flow.
In a possible implementation, the inter-frame optical flow estimation submodule 3021 is specifically configured to: input the first image frame, the second image frame, and the first event frame to the preset optical flow estimation model; and perform cyclic iteration, to obtain the first optical flow.
In a possible implementation, the optical flow allocation submodule 3022 is specifically configured to input the second event frame to a preset optical flow allocation model, to obtain the first optical flow allocation mask.
In a possible implementation, the optical flow allocation submodule 3022 is specifically configured to: input the second event frame to the optical flow allocation model, and perform cyclic iteration, to obtain the first optical flow allocation mask.
In a possible implementation, the first image frame includes H×W pixels, both H and W are integers greater than 1, the first event frame includes a plurality of channels, and the plurality of channels include a first channel, a second channel, a third channel, and a fourth channel; the first channel includes H×W first values, where the H×W first values one-to-one correspond to locations of the H×W pixels, and the first value indicates a quantity of times that luminance of a pixel at a corresponding location in the first image frame increases between the first image frame and the second image frame; the second channel includes H×W second values, where the H×W second values one-to-one correspond to the locations of the H×W pixels, and the second value indicates a quantity of times that luminance of a pixel at a corresponding location in the first image frame decreases between the first image frame and the second image frame; the third channel includes H×W third values, where the H×W third values one-to-one correspond to the locations of the H×W pixels, and the third value indicates time at which luminance of a pixel at a corresponding location in the first image frame increases for the last time between the first image frame and the second image frame; and the fourth channel includes H×W fourth values, where the H×W fourth values one-to-one correspond to the locations of the H×W pixels, and the fourth value indicates time at which luminance of a pixel at a corresponding location in the first image frame decreases for the last time between the first image frame and the second image frame.
In a possible implementation, the obtaining module is specifically configured to: obtain event flow data, where the event flow data includes event data of each event in at least one event, the at least one event one-to-one corresponds to at least one luminance change that occurs in the target scene between the first image frame and the second image frame, and the data of each event includes a timestamp, pixel coordinates, and a polarity; and obtain the first event frame based on the event flow data.
It should be noted that content such as information exchange between the modules and an execution process thereof are based on a same concept as the method embodiments of this application. For details about specific functions and technical effect of the content, refer to the method embodiments. The details are not described herein again. In an optional example, the optical flow estimation apparatus 300 may be specifically the optical flow estimation apparatus in the foregoing embodiment of the optical flow estimation method 200, and the optical flow estimation apparatus 300 may be configured to perform the procedures and/or steps corresponding to the optical flow estimation apparatus in the foregoing embodiment of the optical flow estimation method 200. To avoid repetition, details are not described herein again.
One or more of the modules in the embodiment shown in
For example,
(1) An obtaining module 301 obtains a first image frame, a second image frame, a first event frame, and a second event frame. For details, refer to the related descriptions in step 201 and step 202 of the foregoing method.
(2) The obtaining module 301 sends the first image frame, the second image frame, and the first event frame to an inter-frame optical flow estimation submodule 3021.
(3) The inter-frame optical flow estimation submodule 3021 inputs the first image frame, the second image frame, and the first event frame to an optical flow estimation model, and performs cyclic iteration, to obtain a first optical flow. For details, refer to the related description in step 203 of the foregoing method.
(4) The inter-frame optical flow estimation submodule 3021 sends the first optical flow to an inter-frame optical flow estimation submodule at any moment 3023.
(5) The obtaining module 301 sends the second event frame to an optical flow allocation submodule 3022.
(6) The optical flow allocation submodule 3022 inputs the second event frame to an optical flow allocation model, and performs cyclic iteration, to obtain a first optical flow allocation mask. For details, refer to the related description in step 203 of the foregoing method.
(7) The optical flow allocation submodule 3022 sends the first optical flow allocation mask to the inter-frame optical flow estimation submodule at any moment 3023.
(8) The optical flow allocation submodule 3022 weights the first optical flow by using the first optical flow allocation mask, to obtain the target optical flow.
The communication interface 402 is configured to input image data to the processor 401, and/or output image data from the processor 401. The processor 401 runs a computer program or instructions, so that the optical flow estimation apparatus 400 implements the optical flow estimation method described in the foregoing embodiment of the method 200.
The processor 401 in this embodiment of this application includes but is not limited to a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA), a discrete gate or transistor logic device, a discrete hardware component, or the like. The general-purpose processor may be a microprocessor, a microcontroller, any conventional processor, or the like.
For example, the processor 401 is configured to: obtain a first image frame and a second image frame through the communication interface 402, where the first image frame and the second image frame are any two adjacent image frames in an image sequence, and the image sequence is obtained by photographing a target scene; obtain a first event frame through the communication interface 402, where the first event frame is used to describe a luminance change of the target scene between the first image frame and the second image frame; and determine a target optical flow based on the first image frame, the second image frame, and the first event frame, where the target optical flow includes at least one of an optical flow of a corresponding pixel between the first image frame and a target moment and an optical flow of a corresponding pixel between the target moment and the second image frame, and the target moment is any moment between the first image frame and the second image frame.
In an optional example, a person skilled in the art may understand that the optical flow estimation apparatus 400 may be specifically the optical flow estimation apparatus in the foregoing embodiment of the optical flow estimation method 200, and the optical flow estimation apparatus 400 may be configured to perform the procedures and/or steps corresponding to the optical flow estimation apparatus in the foregoing embodiment of the optical flow estimation method 200. To avoid repetition, details are not described herein again.
Optionally, the optical flow estimation apparatus 400 may further include a memory 403.
The memory 403 may be a volatile memory or a nonvolatile memory, or may include both a volatile memory and a nonvolatile memory. The nonvolatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically erasable programmable read-only memory (Electrically EPROM, EEPROM), or a flash memory. The volatile memory may be a random access memory (Random Access Memory, RAM), and is used as an external cache. By way of example but not limitative description, many forms of RAMs may be used, for example, a static random access memory (Static RAM, SRAM), a dynamic random access memory (Dynamic RAM, DRAM), a synchronous dynamic random access memory (Synchronous DRAM, SDRAM), a double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), an enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), a synchronous link dynamic random access memory (Synchlink DRAM, SLDRAM), and a direct rambus random access memory (Direct Rambus RAM, DR RAM).
Specifically, the memory 403 is configured to store program code and instructions of the optical flow estimation apparatus. Optionally, the memory 403 is further configured to store data, for example, the first optical flow, the first optical flow allocation mask, and the target optical flow, obtained in a process in which the processor 401 performs the foregoing embodiment of the optical flow estimation method 200.
Optionally, the memory 403 may be an independent device, or may be integrated into the processor 401.
It should be noted that
In a possible design, the optical flow estimation apparatus 400 may be a chip. Optionally, the chip may further include one or more memories, configured to store computer-executable instructions. When the chip apparatus runs, the processor may execute the computer-executable instructions stored in the memory, so that the chip performs the foregoing optical flow estimation method.
Optionally, the chip apparatus may be a field programmable gate array, a dedicated integrated chip, a system chip, a central processing unit, a network processor, a digital signal processing circuit, a microcontroller, a programmable controller, or another integrated chip for implementing related functions.
An embodiment of this application further provides a computer-readable storage medium. The computer-readable storage medium stores computer instructions. When the computer instructions are run on a computer, the optical flow estimation method described in the foregoing method embodiments is implemented.
An embodiment of this application further provides a computer program product. When the computer program product runs on a processor, the optical flow estimation method described in the foregoing method embodiments is implemented.
An embodiment of this application provides a terminal. The terminal includes the foregoing optical flow estimation system. Optionally, the terminal may further include a display. The display is configured to display the target optical flow output by the optical flow estimation system.
The optical flow estimation apparatus, the computer-readable storage medium, the computer program product, the chip, and the terminal provided in embodiments of this application are all configured to perform the corresponding optical flow estimation method provided above. Therefore, for beneficial effect that can be achieved by the optical flow estimation apparatus, the computer-readable storage medium, the computer program product, the chip, and the terminal, refer to beneficial effect of the corresponding optical flow estimation method provided above. Details are not described herein again.
It should be understood that, in embodiments of this application, sequence numbers of the foregoing processes do not mean execution sequences. The execution sequences of the processes should be determined based on functions and internal logic of the processes, and should not constitute any limitation on implementation processes of embodiments of this application.
A person of ordinary skill in the art may be aware that units and algorithm steps described with reference to embodiments disclosed in this specification can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether the functions are performed by hardware or software depends on particular applications and design constraints of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of this application.
A person skilled in the art may clearly understand that, for convenient and brief description, for a detailed working process of the foregoing system, apparatus, and unit, refer to a corresponding process in the foregoing method embodiments. Details are not described herein again.
In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the described apparatus embodiments are merely examples. For example, division into the units is merely logical function division and may be other division in an actual implementation. For example, a plurality of units or components may be combined or integrated into another system, or some features may be ignored or not performed. In addition, the displayed or discussed mutual couplings or direct couplings or communication connections may be implemented through some interfaces. The indirect couplings or communication connections between the apparatuses or units may be implemented in electronic, mechanical, or other forms.
The units described as separate components may or may not be physically separate, and components displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected based on actual requirements to achieve the objectives of the solutions of embodiments.
In addition, functional units in embodiments of this application may be integrated into one processing unit, each of the units may exist alone physically, or two or more units may be integrated into one unit.
When the functions are implemented in a form of a software functional unit and sold or used as an independent product, the functions may be stored in a computer-readable storage medium. Based on such an understanding, the technical solutions of this application essentially, or the part contributing to the conventional technology, or some of the technical solutions may be implemented in a form of a software product. The computer software product is stored in a storage medium, and includes several instructions for instructing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or some of the steps of the methods described in embodiments of this application. The storage medium includes: any medium that can store program code, such as a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disc.
The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this application. Any variation or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
202111199513.6 | Oct 2021 | CN | national |
This application is a continuation of International Application No. PCT/CN2022/121050, filed on Sep. 23, 2022, which claims priority to Chinese Patent Application No. 202111199513.6, filed on Oct. 14, 2021. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2022/121050 | Sep 2022 | WO |
Child | 18634153 | US |