This application claims the benefit of priority from Chinese Patent Application No. 202311179915.9, filed on Sep. 13, 2023. The content of the aforementioned application, including any intervening amendments thereto, is incorporated herein by reference in its entirety.
Embodiments of the present application relates to the technical field of vehicle control, and in particular, to a driving world model based on a brain-like neural circuit.
Nowadays, artificial intelligence is being transitioned from proprietary artificial intelligence to general artificial intelligence, and a generative large model represented by ChatGPT shows extraordinary talents in the field of natural language processing, and becomes an existing mainstream general artificial intelligence model for natural language processing. Automatic driving is a reflection of cross fusion of the automobile industry and a new-generation information technology such as artificial intelligence, automatic control, and big data in the traffic field. A high-grade automatic driving system needs to cope with almost all complex traffic environments and complete driving tasks safely and efficiently.
However, most of existing automatic driving models use a modularization method. The method requires a large amount of artificial engineering and involves manual annotation of a single module and cross-module configuration. A new environment and a new task need to be manually redesigned for algorithm upgrade, so that the method is poor in mobility and cannot adapt to development and requirements of the general artificial intelligence.
The embodiments of the present application provide a driving world model based on a brain-like neural circuit. The model uses a monocular camera image as an input. The world model is applied to extracting and memorizing environment dynamics information, simulating a nematode nervous system to establish the brain-like neural circuit to process the environment dynamics information, and completing an end-to-end automatic driving task.
In order to solve the above technical problem, an embodiment of the present application provides a driving world model based on a brain-like neural circuit. The driving world model includes a perception module, an environment memory module, a brain-like neural circuit network module, and a convolutional network module, wherein the perception module is configured to perform image encoding on an input image by taking a monocular camera image as the input image and acquire an image feature under a view angle of an aerial view; the perception module includes a two-dimensional feature encoding unit, a three-dimensional feature encoding unit, and a summing pooling unit which are connected in sequence; the two-dimensional feature encoding unit is configured to extract two-dimensional features from the image feature; the three-dimensional feature encoding unit is configured to project the two-dimensional features to a three-dimensional space to obtain three-dimensional features and predicting a depth probability distribution of each three-dimensional feature; the summing pooling unit is configured to map the three-dimensional features to a bird's eye view space in a summing pooling manner according to the depth probability distributions to obtain the image feature under the view angle of the aerial view; the environment memory module is configured to: acquire environment dynamics information of a current moment according to the image feature and a hidden feature, and output the environment dynamics information to the brain-like neural circuit network module and the convolutional network module; the brain-like neural circuit network module is configured to: simulate a nematode neural network, establish a brain-like neural circuit network, and input the environment dynamics information to the brain-like neural circuit network to obtain a control output of automatic driving; and the convolutional network module is configured to input the environment dynamics information to a convolutional network to generate a bird's eye view of an environment.
In some exemplary embodiments, the environment memory module includes a posterior distribution fitting unit, a prior distribution fitting unit, and a training unit, wherein the posterior distribution fitting unit is configured to fit an environment dynamics posterior distribution through the image feature; the prior distribution fitting unit is configured to fit an environment dynamics prior distribution through the hidden feature; and the training unit is configured to: perform training using a minimum difference between the environment dynamics posterior distribution and the environment dynamics prior distribution, obtain environment dynamics information of a current moment on the basis of the environment dynamics posterior distribution and the hidden feature, and generate a hidden feature of the current moment by using the environment dynamics information of the current moment as a hidden feature of a next moment.
In some exemplary embodiments, the environment memory module acquires the environment dynamics information of the current moment by respectively generating a posterior feature and a prior feature according to the image feature and the hidden feature, wherein the posterior feature is generated by sampling a hidden feature containing historical moment information, an action of a previous moment, and the image feature; and the prior feature is generated by sampling the hidden feature containing the historical moment information and the action of the previous moment.
In some exemplary embodiments, assuming that the posterior feature and the prior feature are both in accordance with a normal distribution, and generation processes of the posterior feature and the prior feature are expressed as:
In some exemplary embodiments, at a future moment, generation processes of the prior feature and the hidden feature of the next moment are expressed as:
In some exemplary embodiments, the brain-like neural circuit network includes four layers of neurons, wherein the four layers of neurons respectively include: Ns perception neurons, Ni internal neurons, Nc instruction neurons, and Nm motoneurons; nso−t synapses are inserted between any two successive layers for any source neuron, wherein nso−t satisfies nso−t≤Nt; a synapse polarity satisfies the Bernoulli distribution, wherein Nt represents a quantity of target neurons, and nso−t target neurons are randomly selected through binomial distribution; mso−t synapses are inserted between any two consecutive layers for any target neuron j without synapses; mso−t satisfies
wherein Lt
In some exemplary embodiments, each neuron is modeled as follows according to features of current transmission between the synapses of the neuron:
In some exemplary embodiments, the brain-like neural circuit network module includes a conversion unit; the conversion unit is configured to convert the environment dynamics information into control action information by using the brain-like neural circuit network, so as to achieve a conversion process from perception to control; a function g is used to represent the brain-like neural circuit network, and the conversion process is expressed by the following formulas:
In some exemplary embodiments, a function fc is used to represent a process of generating the bird's eye view of the environment, which is represented as the following formulas:
In some exemplary embodiments, the driving world model is a world model for model training; a process of the model training includes: taking data from a moment tk to a moment tk+T−1 as historical moment data, taking data from a moment tk+T to a moment tk+T+F as future moment data, inputting the data from tk to tk+T+F to the driving world model for model training, so that a joint probability of an action sequence and an aerial view sequence is maximum, and obtaining a lower limit of the joint probability through variational inference.
In some exemplary embodiments, the lower limit of the joint probability obtained by the variational inference is as shown in the following formula:
The technical solutions provided by the embodiments of the present application at least has the following advantages: The embodiments of the present application provide a driving world model based on a brain-like neural circuit. The model includes a perception module, an environment memory module, a brain-like neural circuit network module, and a convolutional network module, wherein the perception module is configured to perform image encoding on an input image by taking a monocular camera image as the input image and acquire an image feature under a view angle of an aerial view; the perception module includes a two-dimensional feature encoding unit, a three-dimensional feature encoding unit, and a summing pooling unit which are connected in sequence; the two-dimensional feature encoding unit is configured to extract two-dimensional features from the image feature; the three-dimensional feature encoding unit is configured to project the two-dimensional features to a three-dimensional space to obtain three-dimensional features and predicting a depth probability distribution of each three-dimensional feature; the summing pooling unit is configured to map the three-dimensional features to a bird's eye view space in a summing pooling manner according to the depth probability distributions to obtain the image feature under the view angle of the aerial view; the environment memory module is configured to: acquire and memorize environment dynamics information of a current moment according to the image feature and a hidden feature, and output the environment dynamics information to the brain-like neural circuit network module and the convolutional network module; the brain-like neural circuit network module is configured to: simulate a nematode neural network, establish a brain-like neural circuit network, and input the environment dynamics information to the brain-like neural circuit network to obtain a control output of automatic driving; and the convolutional network module is configured to input the environment dynamics information to a convolutional network to generate a bird's eye view of an environment.
The present application provides a driving world model based on a brain-like neural circuit, which takes the monocular camera image as the input, and obtains the two-dimensional features after the input image is encoded by the perception module; then promotes the two-dimensional features to a three-dimensional space to obtain the three-dimensional features; and predicts the depth probability distribution of each three-dimensional feature, and maps the three-dimensional features to a bird's eye view space in the summing pooling manner to obtain the image feature under a view angle of the aerial view. The present application can complete end-to-end automatic driving by only using a monocular camera as an input image. According to the present application, two-dimensional and three-dimensional information of an image can be fully extracted through the perception module, and an autonomous vehicle is helped to safely run under the view angle of the aerial view considering environment depth information. In addition, the present application also establishes the brain-like neural circuit network by simulating an operation process of a nematode neural network on perception, planning, and control, and obtains the control output of automatic driving. Meanwhile, bird's eye view of the environment are generated on the basis of the environment dynamics information, so that the interpretability of the model is improved. The model uses a monocular camera image as an input. The world model is applied to extracting and memorizing environment dynamics information, simulating a nematode nervous system to establish the brain-like neural circuit to process the environment dynamics information, and completing an end-to-end automatic driving task.
One or more embodiments are illustrated by images in corresponding drawings, and these exemplary explanations are not to be construed as limiting the embodiments. Unless expressly stated otherwise, the images in the accompanying drawings do not constitute a proportion limitation.
It can be seen from the background section, most of existing automatic driving models use a modularization method. The method requires a large amount of artificial engineering and involves manual annotation of a single module and cross-module configuration. A new environment and a new task need to be manually redesigned for algorithm upgrade, so that the method is poor in mobility and cannot adapt to development and requirements of the general artificial intelligence.
A current development situation shows that generative artificial intelligence has a potential of bringing a leap change to an automatic driving technology. With the continuous improvement of the emerging capability of generative large models with billions of parameters, it is foreseeable that a significant breakthrough in an automatic driving technical route can be achieved by means of the strong processing capability and the complex parameter structure of the generative artificial intelligence.
In order to solve the above technical problem, an embodiment of the present application provides a driving world model based on a brain-like neural circuit. The model includes a perception module, an environment memory module, a brain-like neural circuit network module, and a convolutional network module, wherein the perception module is configured to perform image encoding on an input image by taking a monocular camera image as the input image and acquire an image feature under a view angle of an aerial view; the perception module includes a two-dimensional feature encoding unit, a three-dimensional feature encoding unit, and a summing pooling unit which are connected in sequence; the two-dimensional feature encoding unit is configured to extract two-dimensional features from the image feature; the three-dimensional feature encoding unit is configured to project the two-dimensional features to a three-dimensional space to obtain three-dimensional features and predicting a depth probability distribution of each three-dimensional feature; the summing pooling unit is configured to map the three-dimensional features to a bird's eye view space in a summing pooling manner according to the depth probability distributions to obtain the image feature under the view angle of the aerial view; the environment memory module is configured to: acquire environment dynamics information of a current moment according to the image feature and a hidden feature, and output the environment dynamics information to the brain-like neural circuit network module and the convolutional network module; the brain-like neural circuit network module is configured to: simulate a nematode neural network, establish a brain-like neural circuit network, and input the environment dynamics information to the brain-like neural circuit network to obtain a control output of automatic driving; and the convolutional network module is configured to input the environment dynamics information to a convolutional network to generate a bird's eye view of an environment. The present application uses a monocular camera image as an input image. The world model is applied to extracting and memorizing environment dynamics information. Two-dimensional and three-dimensional information of an image can be fully extracted through the perception module, and a autonomous vehicle is helped to safely run under the view angle of the aerial view considering environment depth information. The model also simulates a nematode nervous system to establish the brain-like neural circuit to process the environment dynamics information and complete an end-to-end automatic driving task.
The respective embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can understand that in the various embodiments of the present application, numerous technical details are set forth in order to enable readers to better understand the present application. However, the technical solutions claimed by the present application can also be implemented even without these technical details and the various changes and modifications based on the following embodiments.
Referring to
The present application uses a monocular camera image as an input image. The world model is applied to extracting and memorizing environment dynamics information. By enhancing the perception part of the world model and simulating a nematode nervous system, the brain-like neural circuit is established to process the environment dynamics information and complete an end-to-end automatic driving task. A process for enhancing the perceptual part of the world model includes: taking a monocular camera image ok as an input image, first extracting 2D features of a surrounding environment through an image encoding module Resnet in the perception module, and performing image encoding. Since an autonomous vehicle needs to percept a three-dimensional environment, after the two-dimensional features are obtained in the present application, the two-dimensional features are projected into a 3D space through internal parameters and external parameters of a camera to obtain the three-dimensional features, and the depth probability distribution of each three-dimensional feature is predicted; and these three-dimensional features are processed in a summing pooling to obtain the image features under the view angle of the aerial view.
The environment memory module 102 is configured to: acquire environment dynamics information of a current moment according to the image feature and a hidden feature, and output the environment dynamics information to the brain-like neural circuit network module and the convolutional network module. The brain-like neural circuit network module 103 is configured to: simulate a nematode neural network, establish a brain-like neural circuit network, and input the environment dynamics information to the brain-like neural circuit network to obtain a control output of automatic driving; and the convolutional network module 104 is configured to input the environment dynamics information to a convolutional network to generate a bird's eye view of an environment.
Continuing to refer to
In some embodiments, the environment memory module 102 acquires the environment dynamics information of the current moment by respectively generating a posterior feature and a prior feature according to the image feature and the hidden feature, wherein the posterior feature is generated by sampling a hidden feature containing historical moment information, an action of a previous moment, and the image feature; and the prior feature is generated by sampling the hidden feature containing the historical moment information and the action of the previous moment.
In some embodiments, memorized historical features include a posterior feature and a prior feature; the posterior feature is generated by sampling a hidden feature containing historical moment information, an action of a previous moment, and the image feature; and the prior feature is generated by sampling the hidden feature containing the historical moment information and the action of the previous moment.
In some embodiments, assuming that the posterior feature and the prior feature are both in accordance with a normal distribution, and generation processes of the posterior feature and the prior feature are expressed as:
As shown in
The driving world model at a future moment k+T cannot obtain an image input, and the driving world model obtains a future action and an aerial view trend are obtained by imagination. Specifically, the driving world model will not generate a posterior feature at the future moment, but generates the hidden feature hk+T+1 of the next moment by directly using the hidden feature hk+T and the prior feature zk+T.
In some embodiments, at the future moment k+T, generation processes of the prior feature Zk+T and the hidden feature hk+T+1 of the next moment are expressed as:
The present application establishes the brain-like neural circuit network by simulating the nematode nervous system. Caenorhabditis elegans is a very small animal, completing functions of perception, motion, and the like through its nearly perfect nervous system structure, and a plurality of neural circuits in the nervous system of the caenorhabditis elegans are modeled into a four-layer hybrid topological structure. The present application imitates the neural circuit of the caenorhabditis elegans and establishes a brain-like neural circuit network framework, as shown in
Referring to
wherein Lt
In some embodiments, each neuron is modeled as follows according to features of current transmission between the synapses of the neuron:
As described above, the nematode neural network is simulated to establish the brain-like neural circuit network, so that the brain-like neural circuit network is regarded as a function g, and the brain-like neural circuit network is used to convert the environment dynamics information into control action information, so as to achieve the conversion process from perception to control.
In some embodiments, the brain-like neural circuit network module includes a conversion unit; the conversion unit is configured to convert the environment dynamics information into control action information by using the brain-like neural circuit network, so as to achieve a conversion process from perception to control; a function g is used to represent the brain-like neural circuit network, and the conversion process is expressed by the following formulas:
A bird's eye view bk is generated by using the environment dynamics information through the convolutional neural network, so that the interpretability of the end-to-end method can be improved. The aerial view is as shown in
Specifically, in some embodiments, a function fc is used to represent a process of generating the bird's eye view of the environment, which is represented as the following formulas:
In some embodiments, the driving world model is a world model for model training.
A process of the model training includes: taking data from a moment tk to a moment tk+T−1 as historical moment data, taking data from a moment tk+T to a moment tk+T+F as future moment data, inputting the data from tk to tk+T+F to the driving world model for model training, so that a joint probability of an action sequence and an aerial view sequence is maximum, and obtaining a lower limit of the joint probability through variational inference;
In some embodiments, the lower limit of the joint probability obtained by the variational inference is as shown in the following formula:
According to the above technical solutions, the embodiments of the present application provide a driving world model based on a brain-like neural circuit. The model includes a perception module, an environment memory module, a brain-like neural circuit network module, and a convolutional network module, wherein the perception module is configured to perform image encoding on an input image by taking a monocular camera image as the input image and acquire an image feature under a view angle of an aerial view; the perception module includes a two-dimensional feature encoding unit, a three-dimensional feature encoding unit, and a summing pooling unit which are connected in sequence; the two-dimensional feature encoding unit is configured to extract two-dimensional features from the image feature; the three-dimensional feature encoding unit is configured to project the two-dimensional features to a three-dimensional space to obtain three-dimensional features and predicting a depth probability distribution of each three-dimensional feature; the summing pooling unit is configured to map the three-dimensional features to a bird's eye view space in a summing pooling manner according to the depth probability distributions to obtain the image feature under the view angle of the aerial view; the environment memory module is configured to: acquire and memorize environment dynamics information of a current moment according to the image feature and a hidden feature, and output the environment dynamics information to the brain-like neural circuit network module and the convolutional network module; the brain-like neural circuit network module is configured to: simulate a nematode neural network, establish a brain-like neural circuit network, and input the environment dynamics information to the brain-like neural circuit network to obtain a control output of automatic driving; and the convolutional network module is configured to input the environment dynamics information to a convolutional network to generate a bird's eye view of an environment.
The present application provides a driving world model based on a brain-like neural circuit, which takes the monocular camera image as the input, and obtains the two-dimensional features after the input image is encoded by the perception module; then promotes the two-dimensional features to a three-dimensional space to obtain the three-dimensional features; and predicts the depth probability distribution of each three-dimensional feature, and maps the three-dimensional features to a bird's eye view space in the summing pooling manner to obtain the image feature under a view angle of the aerial view. The present application can complete end-to-end automatic driving by only using a monocular camera as an input image. According to the present application, two-dimensional and three-dimensional information of an image can be fully extracted through the perception module, and an autonomous vehicle is helped to safely run under the view angle of the aerial view considering environment depth information. In addition, the present application also establishes the brain-like neural circuit network by simulating an operation process of a nematode neural network on perception, planning, and control, and obtains the control output of automatic driving. Meanwhile, bird's eye view of the environment are generated on the basis of the environment dynamics information, so that the interpretability of the model is improved. The model uses a monocular camera image as an input. The world model is applied to extracting and memorizing environment dynamics information, simulating a nematode nervous system to establish the brain-like neural circuit to process the environment dynamics information, and completing an end-to-end automatic driving task.
Those of ordinary skill in the art can understand that the foregoing implementations are specific embodiments of practicing the present application, while in practical applications, various changes can be made to the implementations in form and detail without departing from the spirit and scope of the present application. Any person skilled in the art can make respective changes and modifications without departing from the spirit and scope of the present application, and the protection scope of the present application is defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
202311179915.9 | Sep 2023 | CN | national |