The present application claims priority from Japanese Patent Application No. 2023-111128 filed on Jul. 6, 2023, the entire contents of which are hereby incorporated by reference.
The disclosure relates to a machine learning method and a machine learning apparatus that predict a position and a velocity of a surrounding object.
There is a technique of predicting a future position and a future velocity of a surrounding object using a technique of machine learning. For example, Reza Mahjourian, Jinkyu Kim, Yuning Chai, Mingxing Tan, Ben Sapp, Dragomir Anguelov, “Occupancy Flow Fields for Motion Forecasting in Autonomous Driving”, [online], Waymo LLC, [retrieved on Jun. 22, 2023], Internet <URL: https://arxiv.org/pdf/2203.03875.pdf>discloses a technique that predicts a position and a velocity of a surrounding object using a technique referred to as an occupancy flow.
An aspect of the disclosure provides a machine learning method including: inputting pieces of position data to a machine learning model, the machine learning model being configured to receive the pieces of position data indicating a position of an object at respective first points in time, and output occupancy data and flow data, the occupancy data including map data indicating occupancy probability of the object at a second point in time later than the first points in time, the flow data including map data indicating a velocity vector of the object at the second point in time; generating second ground truth occupancy data by performing a process of expanding an occupancy region of the object on first ground truth occupancy data, the first ground truth occupancy data including ground truth data of map data indicating occupancy probability of the object at the second point in time; calculating a loss parameter, based on the occupancy data outputted from the machine learning model, the second ground truth occupancy data, the flow data outputted from the machine learning model, and ground truth flow data including ground truth data of map data indicating a velocity vector of the object at the second point in time; and updating the machine learning model, based on the loss parameter.
An aspect of the disclosure provides a machine learning apparatus including a storage and a processor. The storage is configured to store a data set including pieces of position data indicating a position of an object at respective first points in time, first ground truth occupancy data including ground truth data of map data indicating occupancy probability of the object at a second point in time later than the first points in time, and ground truth flow data including ground truth data of map data indicating a velocity vector of the object at the second point in time. The processor is configured to perform a machine learning process, based on the data set. The processor is configured to: input the pieces of position data to a machine learning model configured to receive the pieces of position data and output occupancy data and flow data, the occupancy data including map data indicating occupancy probability of the object at the second point in time, the flow data including map data indicating a velocity vector of the object at the second point in time; generate second ground truth occupancy data by performing a process of expanding an occupancy region of the object on the first ground truth occupancy data; calculate a loss parameter, based on the occupancy data outputted from the machine learning model, the second ground truth occupancy data, the flow data outputted from the machine learning model, and the ground truth flow data; and update the machine learning model, based on the loss parameter.
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and, together with the specification, serve to explain the principles of the disclosure.
In a technique of predicting a future position and a future velocity of a surrounding object using a technique of machine learning, it is desired that a position and a velocity of the surrounding object be predicted with high prediction accuracy, and further improvement in the prediction accuracy is expected.
It is desirable to provide a machine learning method and a machine learning apparatus that make it possible to improve accuracy in predicting a position and a velocity of an object.
In the following, some example embodiments of the disclosure are described in detail with reference to the accompanying drawings. Note that the following description is directed to illustrative examples of the disclosure and not to be construed as limiting to the disclosure. Factors including, without limitation, numerical values, shapes, materials, components, positions of the components, and how the components are coupled to each other are illustrative only and not to be construed as limiting to the disclosure. Further, elements in the following example embodiments which are not recited in a most-generic independent claim of the disclosure are optional and may be provided on an as-needed basis. The drawings are schematic and are not intended to be drawn to scale. Throughout the present specification and the drawings, elements having substantially the same function and configuration are denoted with the same reference numerals to avoid any redundant description. In addition, elements that are not directly related to any embodiment of the disclosure are unillustrated in the drawings.
The imaging unit 11 may be configured to capture an image of the object around the vehicle 1. In this example, the imaging unit 11 may include imagers 12. Each of the imagers 12 may capture an image of a region ahead, behind, or sideways of the vehicle 1, for example. Note that this example is a non-limiting example, and the imaging unit 11 may include one imager 12 that captures an image of a region ahead of the vehicle 1, for example. Each of the imagers 12 may include, for example, an image sensor and a lens. The imagers 12 may each generate a captured image by performing an imaging operation in synchronization with each other. The imaging unit 11 may be configured to supply image data DP including the captured image generated by each of the imagers 12 to the image processor 20.
The image processor 20 may be configured to detect a position of the object around the vehicle 1, based on the captured images included in the image data DP, and predict the future position and the future velocity of the object. For example, based on a result of the processing of the image processor 20, the vehicle 1 may be configured to allow a travel control of the vehicle 1 to be performed or information on the recognized object to be displayed on a console monitor. The image processor 20 may include, for example, a central processing unit (CPU) that executes a program, a random-access memory (RAM) that temporarily stores processing data, and a read-only memory (ROM) that stores the program. The image processor 20 may include an object detector 21, a prediction processor 22, and an output processor 23.
The object detector 21 may be configured to detect an object around the vehicle 1, based on captured images related to the same capturing point in time generated by the respective imagers 12. The object detector 21 may be configured to generate position data DPOS, based on a result of the detection of the object around the vehicle 1. The position data DPOS may be map data indicating a position of the object with reference to a position of the vehicle 1.
As illustrated in
The prediction processor 22 (
The position data memory 31 may be configured to store the position data DPOS supplied from the object detector 21 for a predetermined period. As a result, pieces of position data DPOS related to capturing points in time may be accumulated in the position data memory 31.
The arithmetic processor 32 may be configured to generate pieces of predicted occupancy data DOC related to points in time in the future and pieces of predicted flow data DFL related to the points in time in the future using the machine learning model, based on the pieces of position data DPOS related to the capturing points in time. For example, the arithmetic processor 32 may be configured to generate 80 pieces of predicted occupancy data DOC and 80 pieces of predicted flow data DFL at 80 points in time in the future, based on eight pieces of position data DPOS at eight points in time in the past.
The output processor 23 (
The machine learning model to be used in the prediction processor 22 may be generated in advance by a machine learning process, and set in the prediction processor 22 of the vehicle 1. Next, a machine learning apparatus 40 that generates the machine learning model will be described.
The processor 41 may include, for example, a CPU and a RAM, and may be configured to generate the machine learning model by performing the machine learning process using data sets DS supplied from the storage 42.
The storage 42 may include, for example, a solid state drive (SSD) and a hard disk drive (HDD). The storage 42 is configured to store the data sets DS. The data sets DS may be prepared in advance by an engineer, for example, and stored in the storage 42. The machine learning apparatus 40 may be configured to generate the machine learning model by performing the machine learning process using the data sets DS.
The learning processor 51 may be configured to generate pieces of predicted occupancy data DOC and pieces of predicted flow data DFL using the machine learning model being trained, based on the pieces of position data DPOS included in the data set DS. Further, the learning processor 51 may be configured to update the machine learning model by performing a backpropagation process, based on a loss parameter LOSS supplied from the loss calculator 53.
The correction processor 52 may be configured to generate pieces of ground truth occupancy data DOC2 by performing a correction process of expanding an occupancy region (a ground truth occupancy region) of the object, based on the pieces of ground truth occupancy data DOC1 included in the data set DS.
In this manner, the correction processor 52 may be configured to generate the ground truth occupancy region R2 by expanding the ground truth occupancy region R1 included in the ground truth occupancy data DOC1, based on the ground truth occupancy data DOC1, and generate the ground truth occupancy data DOC2 including the ground truth occupancy region R2.
The loss calculator 53 may be configured to calculate the loss parameter LOSS, based on the pieces of predicted occupancy data DOC supplied from the learning processor 51, the pieces of ground truth occupancy data DOC2 supplied from the correction processor 52, the pieces of predicted flow data DFL supplied from the learning processor 51, and the pieces of ground truth flow data DFL1 supplied from the storage 42.
The loss calculator 53 may calculate loss parameters LO, LF, and LW using the following equations EQ1 to EQ4, and calculate the loss parameter LOSS, based on the loss parameters LO, LF, and LW.
In the equations, OCt (x, y) may represent the occupancy probability at coordinates x, y in the predicted occupancy data DOC related to time t, and OC2t (x, y) may represent the occupancy probability at the coordinates x, y in the ground truth occupancy data DOC2 related to the time t. FLt (x, y) may represent the velocity vector at the coordinates x, y in the predicted flow data DFL at the time t. FL1t (x, y) may represent the velocity vector at the coordinates x, y in the ground truth flow data DFL1 at the time t. Wt-1 may represent the occupancy probability at the coordinates x, y in the occupancy data related to time t−1.
The loss calculator 53 may calculate the loss parameter LO using equation EQ1. For example, the loss calculator 53 may calculate the loss parameter LO by calculating a cross entropy between the occupancy probability OCt (x, y) in the predicted occupancy data DOC related to the time t and the occupancy probability OC2t (x, y) in the ground truth occupancy data DOC2 related to the time t. The loss parameter LO may be an index indicating a degree of match between the predicted occupancy data DOC and the ground truth occupancy data DOC2. When the degree of match between the predicted occupancy data DOC and the ground truth occupancy data DOC2 is high, the loss calculator 53 may set the loss parameter LO to a small value. When the degree of match between the predicted occupancy data DOC and the ground truth occupancy data DOC2 is low, the loss calculator 53 may set the loss parameter LO to a high value. For example, in the example of
Further, the loss calculator 53 may calculate the loss parameter LF using equation EQ2. For example, the loss calculator 53 may calculate a difference in the velocity, based on the velocity vector FLt (x, y) in the predicted flow data DFL related to the time t and the velocity vector FL11 (x, y) in the ground truth flow data DFL1 related to the time t. The loss calculator 53 may calculate the loss parameter LF, based on the calculated difference in the velocity and the occupancy probability OC2t (x, y) in the ground truth occupancy data DOC2 related to the time t. The loss parameter LF may be an index indicating a degree of match between the predicted flow data DFL and the ground truth flow data DFL1. When the degree of match between the predicted flow data DFL and the ground truth flow data DFL1 is high, the loss calculator 53 may set the loss parameter LF to a small value. When the degree of match between the predicted flow data DFL and the ground truth flow data DFL1 is low, the loss calculator 53 may set the loss parameter LF to a high value. For example, in the example of
Further, the loss calculator 53 may calculate the loss parameter LW using equations EQ3 and EQ4. For example, using equation EQ 3, the loss calculator 53 may generate occupancy probability Wt in the occupancy data related to the time t, based on the occupancy probability Wt-1 in the occupancy data related to the time t−1 and the velocity vector FLt in the predicted flow data DFL related to the time t. Further, using equation EQ4, the loss calculator 53 may calculate the loss parameter LW by calculating a cross entropy between: a product of the occupancy probability Wt (x, y) in the occupancy data related to the time t and the occupancy probability OCt (x, y) in the predicted occupancy data DOC related to the time t; and the occupancy probability OC2t (x, y) in the ground truth occupancy data DOC2 related to the time t. The loss parameter LW may be a flow trace loss and an index indicating a degree of match between the predicted data and the ground truth data related to both the occupancy data and the flow data. When the degree of match between the predicted data and the ground truth data is high, the loss calculator 53 may set the loss parameter LW to a small value. When the degree of match between the predicted data and the ground truth data is low, the loss calculator 53 may set the loss parameter LW to a high value.
The loss calculator 53 may calculate the loss parameter LOSS by performing weighted addition using a predetermined weight set to each of the loss parameters LO, LF, and LW, based on the loss parameters LO, LF, and LW calculated as described above. The learning processor 51 may be configured to update a model parameter of the machine learning model to decrease the loss parameter LOSS by performing the backpropagation process.
The machine learning apparatus 40 may perform the machine learning process, based on the loss parameter LW related to both the occupancy data and the flow data in addition to the loss parameter LO related to the occupancy data and the loss parameter LF related to the flow data. This makes it possible for the machine learning apparatus 40 to improve accuracy of the machine learning process. In other words, because the data included in the data set DS includes noise, there is a possibility that learning accuracy is lowered due to the influence of the noise when, for example, the position and the velocity are individually learned simply using the loss parameter LO related to the occupancy data and the loss parameter LF related to the flow data. The machine learning apparatus 40 may perform the machine learning process, based on the loss parameter LW related to both the occupancy data and the flow data in addition to the loss parameters LO and LF. This makes it possible for the machine learning apparatus 40 to easily learn a motion of the object for a longer period and improve the learning accuracy.
In one embodiment, the position data DPOS may serve as “position data”. In one embodiment, the predicted occupancy data DOC may serve as “occupancy data”. In one embodiment, the predicted flow data DFL may serve as “flow data”. In one embodiment, the ground truth occupancy data DOC1 may serve as “first ground truth occupancy data”. In one embodiment, the ground truth occupancy data DOC2 may serve as “second ground truth occupancy data”. In one embodiment, the ground truth flow data DFL1 may serve as “ground truth flow data”. In one embodiment, the loss parameter LOSS may serve as a “loss parameter”. In one embodiment, the storage 42 may serve as a “storage”. In one embodiment, the processor 41 may serve as a “processor”.
Operation and workings of the surrounding environment recognition device 10 and the machine learning apparatus 40 according to the example embodiment will now be described.
First, with reference to
In the surrounding environment recognition device 10, the object detector 21 of the image processor 20 may detect an object around the vehicle 1, based on captured images related to the same capturing point in time generated by the respective imagers 12. The object detector 21 may generate the position data DPOS, based on a result of the detection of the object around the vehicle 1. The prediction processor 22 may generate the predicted occupancy data DOC and the predicted flow data DFL by predicting the future position and the future velocity of the object using the machine learning model, based on the position data DPOS. For example, the position data memory 31 of the prediction processor 22 may store the position data DPOS supplied from the object detector 21 for the predetermined period. As a result, the pieces of position data DPOS related to the capturing points in time may be accumulated in the position data memory 31. The arithmetic processor 32 of the prediction processor 22 may generate the pieces of predicted occupancy data DOC related to points in time in the future and the pieces of predicted flow data DFL related to the points in time in the future using the machine learning model, based on the pieces of position data DPOS related to the capturing points in time. The output processor 23 may generate the prediction result RES, based on the pieces of predicted occupancy data DOC and the pieces of predicted flow data DFL supplied from the prediction processor 22.
In the machine learning apparatus 40, the learning processor 51 of the processor 41 may generate the pieces of predicted occupancy data DOC and the pieces of predicted flow data DFL using the machine learning model being trained, based on the pieces of position data DPOS included in the data set DS. The correction processor 52 may generate the pieces of ground truth occupancy data DOC2 by performing the correction process of expanding the ground truth occupancy region of the object, based on the pieces of ground truth occupancy data DOC1 included in the data set DS. The loss calculator 53 may calculate the loss parameter LOSS, based on the pieces of predicted occupancy data DOC supplied from the learning processor 51, the pieces of ground truth occupancy data DOC2 supplied from the correction processor 52, the pieces of predicted flow data DFL supplied from the learning processor 51, and the pieces of ground truth flow data DFL1 supplied from the storage 42. The learning processor 51 may update the machine learning model by performing the backpropagation process, based on the loss parameter LOSS supplied from the loss calculator 53.
First, the processor 41 may select one of the data sets DS stored in the storage 42 (step S101).
Thereafter, the learning processor 51 of the processor 41 may generate the pieces of predicted occupancy data DOC and the pieces of predicted flow data DFL using the machine learning model, based on the pieces of position data DPOS included in the selected data set DS (step S102).
Thereafter, the correction processor 52 of the processor 41 may generate the pieces of ground truth occupancy data DOC2 by performing the correction process of expanding the ground truth occupancy region of the object, based on each of the pieces of ground truth occupancy data DOC1 included in the selected data set DS (step S103). For example, as illustrated in
Thereafter, the loss calculator 53 of the processor 41 may calculate the loss parameter LOSS, based on the pieces of predicted occupancy data DOC, the pieces of ground truth occupancy data DOC2, the pieces of predicted flow data DFL, and the pieces of ground truth flow data DFL1 (step S104).
Thereafter, the learning processor 51 of the processor 41 may update the model parameter of the machine learning model using the backpropagation process (step S105).
Thereafter, the processor 41 may check whether the number of learning steps that has been executed has reached the predetermined number of learning steps (step S106). When the number of learning steps that has been executed has not reached the predetermined number of learning steps yet (“N” in step S106), the processor 41 may repeat the processes of steps S102 to S106 until the predetermined number of learning steps is reached.
In step S106, when the number of learning steps that has been executed has reached the predetermined number of learning steps (“Y” in step S106), the processor 41 may check whether all of the data sets DS stored in the storage 42 have been selected (step S107). When not all of the data sets DS have been selected (“N” in step S107), the processor 41 may select one of the one or more data sets DS that have not been selected yet (step S108), and cause the process to return to step S102. The processor 41 may repeat the processes of steps S102 to S108 until all the data sets DS are selected.
In step S107, when all of the data sets DS stored in the storage 42 have been selected (“Y” in step S107), the process may be ended.
A machine learning model was generated using the machine learning method illustrated in
Results of the three evaluation indices for the machine learning model generated using the machine learning method illustrated in
As described above, the machine learning method performed by the machine learning apparatus 40 includes a first process, a second process, a third process, and a fourth process. The first process includes inputting the pieces of position data DPOS to the machine learning model. The machine learning model is configured to receive the pieces of position data DPOS indicating a position of an object at respective first points in time, and output occupancy data (the predicted occupancy data DOC) and flow data (the predicted flow data DFL). The occupancy data (the predicted occupancy data DOC) includes map data indicating the occupancy probability of the object at a second point in time later than the first points in time. The flow data (the predicted flow data DFL) includes map data indicating the velocity vector of the object at the second point in time. The second process includes generating second ground truth occupancy data (the ground truth occupancy data DOC2) by performing a process of expanding the occupancy region of the object on first ground truth occupancy data (the ground truth occupancy data DOC1). The first ground truth occupancy data (the ground truth occupancy data DOC1) includes the ground truth data of map data indicating the occupancy probability of the object at the second point in time. The third process includes calculating the loss parameter LOSS, based on the occupancy data (the predicted occupancy data DOC) outputted from the machine learning model, the second ground truth occupancy data (the ground truth occupancy data DOC2), the flow data (the predicted flow data DFL) outputted from the machine learning model, and ground truth flow data (the ground truth flow data DFL1) including the ground truth data of map data indicating the velocity vector of the object at the second point in time. The fourth process includes updating the machine learning model, based on the loss parameter LOSS. As described above, because the ground truth occupancy data DOC2 is generated by performing the process of expanding the occupancy region of the object, it is possible for the machine learning process to be performed allowing a slight difference between the predicted occupancy data DOC and the ground truth occupancy data DOC2. For example, the future position and the future velocity of a vehicle can change depending on a driving operation of a driver who drives the vehicle. Accordingly, performing the machine learning process while allowing a slight difference as described above makes it possible to enhance generalization performance. As a result, the machine learning method makes it possible to improve the accuracy in predicting the position and the velocity of the object.
In some embodiments, as illustrated in
In some embodiments, as illustrated in
As described above, the machine learning method and the machine learning apparatus according to the example embodiment include the first process, the second process, the third process, and the fourth process. The first process includes inputting the pieces of position data to the machine learning model. The machine learning model is configured to receive the pieces of position data indicating a position of an object at respective first points in time, and output occupancy data and flow data. The occupancy data includes map data indicating the occupancy probability of the object at a second point in time later than the first points in time. The flow data includes map data indicating the velocity vector of the object at the second point in time. The second process includes generating second ground truth occupancy data by performing a process of expanding the occupancy region of the object on first ground truth occupancy data. The first ground truth occupancy data includes the ground truth data of map data indicating the occupancy probability of the object at the second point in time. The third process includes calculating the loss parameter, based on the occupancy data outputted from the machine learning model, the second ground truth occupancy data, the flow data outputted from the machine learning model, and ground truth flow data including the ground truth data of map data indicating the velocity vector of the object at the second point in time. The fourth process includes updating the machine learning model, based on the loss parameter. This helps to improve the accuracy in predicting the position and the velocity of the object.
In some embodiments, in the second process, the second ground truth occupancy data may be generated by expanding the occupancy region by a predetermined amount. This helps to improve the accuracy in predicting the position and the velocity of the object.
In some embodiments, in the second process, the second ground truth occupancy data may be generated by performing a blurring process on the first ground truth occupancy data. This helps to improve the accuracy in predicting the position and the velocity of the object.
In the above-described example embodiment, the correction processor 52 may generate the pieces of ground truth occupancy data DOC2, based on the pieces of ground truth occupancy data DOC1; however, this example is a non-limiting example. In some embodiments, the pieces of ground truth occupancy data DOC2 may be generated based on the pieces of ground truth occupancy data DOC1 and the pieces of ground truth flow data DFL1. The modification example 1 will hereafter be described in detail.
In the above-described example embodiment, the object may be the vehicle; however, this example is a non-limiting example. In some embodiments, the object may be a pedestrian.
In the above-described example embodiment, as illustrated in
Further, any two or more of these modifications may be combined with each other.
Although some example embodiments of the disclosure have been described in the foregoing by way of example with reference to the accompanying drawings, the disclosure is by no means limited to the embodiments described above. It should be appreciated that modifications and alterations may be made by persons skilled in the art without departing from the scope as defined by the appended claims. The disclosure is intended to include such modifications and alterations in so far as they fall within the scope of the appended claims or the equivalents thereof.
For example, in the above-described example embodiment, the predicted occupancy data DOC may include the map data indicating the occupancy probability of the object at a certain point in time in the future with reference to the position of the vehicle 1; however, this example is a non-limiting example. In some embodiments, the predicted occupancy data DOC may include the map data indicating the occupancy probability of the object in global coordinates on earth.
For example, in the above-described example embodiment, the predicted flow data DFL may include the map data indicating the velocity vector of the object at a certain point in time in the future with reference to the position of the vehicle 1; however, this example is a non-limiting example. In some embodiments, the predicted flow data DFL may include the map data indicating the velocity vector of the object in global coordinates on earth.
For example, in the above-described example embodiment, the future position and the future velocity of the object around the vehicle 1 traveling on a road surface may be predicted; however, this example is a non-limiting example. In some embodiments, the future position and the future velocity of the object around a flying object may be predicted. The flying object may include, for example, a flying vehicle, a helicopter, a drone, and any object having flying capability.
The example effects described herein are mere examples, and example effects of the disclosure are therefore not limited to those described herein, and other example effects may be achieved.
Furthermore, the disclosure may encompass at least the following embodiments.
Each of the processor 41 and the processor 41A illustrated in
Number | Date | Country | Kind |
---|---|---|---|
2023-111128 | Jul 2023 | JP | national |