The invention belongs to the field of environmental cognition and navigation of the intelligent mobile robot, and in particular relates to a method for constructing episodic memory model based on rat brain visual pathway and entorhinal-hippocampal cognitive mechanism.
Environmental perception and cognition is a basic skill of human and animal brains, and it is also a fundamental task and a key issue for autonomous mobile robots. Intelligent behavior like higher mammals is a necessary condition for it to be able to quickly and accurately achieve goal-oriented navigation in complex and unknown environments. How to endow robots with this ability is a common concern in the fields of artificial intelligence, robotics, and neuroscience.
Physiological studies have shown that the key to environmental cognition and navigation in rats lies in the existence of a variety of neuron cells with specific firing effects on space in the brain, mainly including: head-direction cells, stripe cells, grid cells, and object vector cells in the entorhinal cortex; dentate gyrus neurons, hippocampal CA3 place cells, and hippocampal CA1 place cells in the hippocampus. These neuron cells are also called spatial cells.
Seventy percent of mammalian perception information about the environment is collected through the visual pathway, and the information is further transmitted through the brain circuit to an organizational structure called the hippocampus that is responsible for environmental cognition. In the rat brain structure, speed and head-direction information are thought to be input to the entorhinal cortex in the medial temporal lobe. The visual information is transmitted through two neural pathways in the output nerve of the occipital lobe. One of the pathways for information transmission along the ventral side is called the ventral pathway. The main function of this part is object perception and recognition, so it is known as the “what pathway”; another pathway along the dorsal side is called the dorsal pathway, also known as the occipital pathway. It is specific to spatial perception, and encodes motion information, determines where objects are, and analyzes the spatial position information of objects in the scene. At the same time, it can give information about its relative external position, so it is also called “where pathway”. The visual information after the fusion of the ventral and dorsal pathways is considered to be input to the non-grid cell structure of the entorhinal cortex, and then sent to the hippocampus together with the position and head-direction information output by the entorhinal cortex.
In the hippocampus, the speed and head-direction information are first input to the neurons of the dentate gyrus, and then transmitted to the place cells in the hippocampal CA3 area of the hippocampus to realize the neuron representation of the spatial position. Then it is fused with visual information, transmitted to the place cells in the hippocampus CA1 area and stored, realizing the joint memory of the spatial environment and spatial position. Therefore, in the rat brain structure, the entorhinal-hippocampal CA3 structure is used to represent the position in the environment; the two visual pathways are used to represent the scenario information of the spatial environment; and the function of the hippocampal CA1 structure is to store two fusion information.
Despite the rapid development of artificial intelligence technology in recent years, the perception and cognitive ability of autonomous mobile robots at this stage are far from the level of humans and animals. Therefore, drawing on and simulating the environment perception and cognition mechanism of humans and animals, constructing environment perception and cognition models of autonomous mobile robots has become a hot issue in the research of intelligent mobile robots. Based on the physiological characteristics of the rat brain visual pathway and the entorhinal-hippocampus and its environmental cognition mechanism, the present invention is oriented towards the independent exploration of the intelligent mobile robot and its unknown environment, and proposes a method for constructing the episodic memory model of the intelligent mobile robot.
Traditional robot environment cognition and navigation models mainly face the following problems:
In order to solve the problem that the robot cannot effectively construct the cognitive map of environmental situation caused by above problems, the present invention proposes a method for constructing episodic memory model based on rat brain visual pathway and entorhinal-hippocampal cognitive mechanism, which mainly includes: 1. construction of entorhinal-hippocampal CA3 neural computing model; 2. construction of “what pathway” and “where pathway” visual pathway computing models; 3. construction of cognitive nodes imitating hippocampal CA1 place cells; 4. construction of episodic cognitive map based on cognitive node. First, collect the image information of the environment through the camera, collect the head-direction angle and speed information of the robot through the gyroscope and the encoder, and transmit the above information to the CPU. Among them, the head-direction angle and speed information is input into the entorhinal-hippocampus CA3 neural computing model to obtain the precise position of the robot; the visual information is input into the visual pathway computing model to obtain the scene information within the robot's field of view. The attribute and position information of the external environment objects from the two visual pathways are fused with the position and head-direction information output from the rat brain entorhinal-hippocampal CA3 neural computing model, and stored in the cognitive node with the topological structure relationship. The scenario information is used to correct the path integration error in the process of robot exploration, and then construct the episodic cognition map of environmental expression. The concrete workflow of the inventive method is as follows:
e
i{Φ0i, (Xenvi, Yenvi), (niobject, {ρij}, {Φij}, {dij})} (1)
Wherein, Φ0i represents the head-direction angle of the robot at the i-th cognitive node, (Xenvi, Yenvi) represents the position information of the robot in the environment at the i-th cognitive node, (niobject, {ρij}, {Φij}, {dij}), represents the environmental features within the robot's field of view at the i-th cognitive node, and niobject represents the number of objects at the i-th cognitive node, ρij represents the attribute of the j-th object at the i-th cognitive node, Φij represents the orientation angle of the j-th object at the i-th cognitive node relative to the robot, and dij represents the distance between the j-th object at the i-th cognitive node and robot.
The present invention respectively constructs a position cognition model based on the entorhinal-hippocampus CA3 structure of the rat brain and an environment cognition model based on the visual pathway, and uses the scenario information to correct the path integral error of the position cognition model in the process of robot exploration, and then constructs episodic cognition map of the environment. Compared with the closed-loop detection method used in traditional SLAM, the object-level image matching imitating the mechanism of biological cognition has better robustness in complex, changeable and repeatable environments. Combining this closed-loop detection method with the correction algorithm of cumulative errors can obtain a more accurate environmental cognition map. Moreover, the bionic method of the present invention has low requirements on hardware and sensors, and the whole model has good scalability and adaptability, and is suitable for navigation in different indoor environments.
The present invention will be described in detail below in conjunction with the accompanying drawings and examples.
Specific steps are as follows:
Physiological studies have shown that speed and head-direction angle information are input to the hippocampal CA3 structure through the entorhinal-hippocampal information transmission pathway in rat brain, and form a representation of its own pose.
Based on this, the present invention proposes a method for constructing an entorhinal-hippocampal CA3 neural computing model, which obtains robot position information in a bionic manner.
Firstly, the mathematical expression of firing rate of stripe cells in two-dimensional space is given as:
V
stripe(t)=cos(2πf·∫vHDdt)+cos(2πfd·∫vHDdt) (2)
In formula (2), t represents the time at the current moment, f represents the oscillation frequency of neuron cell body, and its value is randomly selected within the range of 0-256 Hz, fd represents the oscillation frequency of neuron dendrites. ∫vHDdt represents the path integral along the preferred direction angle ΦHD of the stripe cells, where vHD represents the component velocity of the rat at the preferred direction angle ΦHD, and its mathematical expression is as follows:
v
HD
=v cos(Φ−ΦHD) (3)
In formula (3), v represents the current moving speed of the robot, and Φ represents the current head-direction angle of the robot. The meaning expressed by formula (2) is the interaction of waveforms corresponding to two frequencies, and a new waveform is presented in one-dimensional space, called stripe wave. The envelope of its waveform will have a relatively slow “beat” frequency, which is the oscillation frequency of the stripe wave. Set the frequency be fb, and its mathematical expression is:
f
b
=f
d
−f (4)
The mathematical expressions of fringe wave oscillation frequency fb and its mathematical expression is shown in formula (5):
f
b
=v
HD/λb=B1v cos(Φ−ΦHD) (5)
In formula (4), λb represents wavelength of the stripe wave, and its value is randomly selected within the range of 0.05 m˜100 m, and B1 represents the reciprocal of the stripe wave wavelength. Combining formula (3) and formula (5), the mathematical expression of neuron dendritic oscillation frequency fd can be obtained as:
f
d
=f+B
1
v cos(Φ−ΦHD) (6)
Physiological studies have shown that when the preferred direction angles of the three stripe cells differ by 120°, the stripe waves generated by them can spatially form a regular hexagonal grid field throughout the entire space through the oscillation interference mechanism in two-dimensional plane.
g(t)=ΠHD(cos(2πf·∫vHDdt)+cos(2π(f+Bv cos(Φ−ΦHD))·∫vHDdt)) (7)
In formula (7), the values of the three stripe cell preferred direction angles ΦHD are Φg+0°, Φg+120°, Φg+240° respectively, where Φg represents the deviation angle of the stripe cells, and its value ranges from random selection within 0°˜360°. Φg also represents the orientation angle of the grid field. After the grid cell firing rate is obtained, it is used as the forward input signal of the dentate gyrus neurons, and the mathematical expression of the excitatory input signal transmitted by the grid cell group to the dentate gyrus neurons is:
In formula (8), i and j represent the numbers of dentate gyrus neurons and grid cells respectively, gj(t) represents the firing rate of the j-th grid cell, and ngrid represents the number of grid cells.
W represents the excitatory input connection weight matrix, where Wij represents the connection weight from the j-th grid cell to the i-th dentate gyrus neuron, and the calculation formula of each connection weight is as follows:
In formula (9), s represents synapse size, and the value is randomly selected in the range of (0˜0.2)μm2. Each size of s corresponds to its proportion in all synapses P(s) roughly obeys the following mathematical expression:
In formula (10), A=100.7, B=0.02, σ1=0.022, σ2=0.018, σ3=0.15. The excitatory input connection weight matrix W can be assigned by formula (10) and formula (11), so as to realize the excitatory transmission from grid cells to dentate gyms neurons. Firing activity of dentate gyrus neurons within a given spatial region is subject to a WTA learning rule that describes competing activity arising from gamma-frequency feedback inhibition. The mathematical expression of the firing rate of dentate gyrus neurons is:
F
i
dentate(t)=IiMEC(t)·H(IiMEC(t)−(1−k1)·ImaxMEC) (11)
In formula (11), k1 is 0.1, and its value determines which dentate gyrus neurons will be activated according to the WTA learning rule. ImaxMEC represents the maximum value of grid cell forward input received by dentate gyrus neurons. H(x) is a rectification function, when x>0, H(x)=1; otherwise, when x≤0, the function value is 0. After obtaining the firing rate expression of dentate gyrus neurons, the excitatory input signal Iidentate(t) from dentate gyrus neurons to hippocampal CA3 place cells can be calculated, as shown in formula (12), and its calculation method is similar to formula (8).
In formula (12), i and j represent the serial numbers of hippocampal CA3 place cells and dentate gyrus neurons respectively, and ndentate represents the number of dentate gyrus neurons, which is set to 1000. Fmaxdentate represents the maximum firing rate of neurons in the dentate gyms. Since Fidentate(t) is always greater than zero, dividing it by the maximum firing rate is similar to normalization. Ω represents the excitatory input connection weight matrix, where Ωij represents the connection weight from the j-th dentate gyrus neuron to the i-th hippocampal CA3 place cell, and the value ranges from 0-1. Distribution function of the connection weight value is defined as a non-negative Gaussian distribution, and the mathematical expression is as follows:
In formula (13), A2=1.033, μ=24, σ=13. The excitatory input connection weight matrix Ω can be assigned by formula (13), so as to realize the excitatory transmission from the dentate gyrus neurons to the hippocampal CA3 place cells. The hippocampal CA3 place cells of the hippocampus receive forward input from the neurons of the entorhinal cortex and the dentate gyrus at the same time, so the mathematical expression of the total excitatory input signal received by the hippocampal CA3 place cells is:
I
i
CA3(t)=IiMEC(t)+IavMEC(t)Iidentate(t) (14)
In formula (14), IiMEC(t) and Iidentate(t) are respectively the forward input signals of grid cells and dentate gyrus neurons mentioned above, and IavMEC(t) represents the average strength of grid cell forward input signals, and its mathematical expression is:
In formula (15), nCA3 represents the number of hippocampal CA3 place cells, which is set as 1600. Then the expression of firing rate of hippocampal place cells can be obtained, the mathematical expression is as follows, and the calculation method is similar to formula (10).
F
i
CA3(t)=IiCA3(t)·H(IiCA3(t)−(1−k2)·ImaxCA3) (16)
In formula (16), ImaxCA3 represents the maximum value of the total excitation input signal received by hippocampal CA3 place cells, and the value of k2 is 0.1. The information transfer mapping model from the entorhinal cortex to the CA3 region of the hippocampus can be established through formulas (2) to (16).
In order to make the model have the ability of position cognition and realize quantification of place cell firing rate in the actual physical space, a spatial position recognition model composed of hippocampal CA3 place cells was established. Firstly, all hippocampal CA3 place cells were arranged in sequence into a cell plate model capable of representing position, and the shape of the cell plate was square. It can be seen from above that the number of hippocampal CA3 place cells is nCA3, then the side length of the cell plate Nx=√{square root over (nCA3)}=40 and the corresponding coding area are set as a square area, the side length of the area is L, and the value is preferably in the range of 5 m˜20 m. Therefore, the mathematical expression of the place field center coordinates of each place cell is as follows:
In formula (17), i, j respectively represent the number of columns and rows of the current place cell on the cell plate, and rij represents the coordinates of the center of the place field of the place cell. Modeling hippocampal CA3 place cells as a square cell plate enables forward inputs generated by the entorhinal cortex to be represented on the plate as packets of excitatory activity. There is also an interaction between hippocampal CA3 place cells. In local connections, hippocampal CA3 place cells excite and inhibit surrounding cells through synaptic branches, and eventually the nerve cells with the strongest excitability win the competition, forming a single peak exciting activity pack.
A two-dimensional Gaussian distribution is used to create the excitability weight connection matrix εm,n of hippocampal CA3 place cells, where the subscripts m and n represent the distance between the horizontal and vertical coordinates of the unit in the coordinate system X and Y respectively, and its value are both set to 15. The mathematical expression of the weight distribution of the excitatory weight connection matrix is:
In formula (18), kp represents the constant of position distribution width, and the value is 7. The amount of change in hippocampal CA3 place cell activity at time t due to local excitatory connections is:
In formula (19), pi,jt represents the firing rate of place cells in row i, column j on the cell plate at time t after interaction, and its initial value is the firing rate FiCA3(t) of hippocampal CA3 place cells, and the output of inhibitory signals of hippocampal CA3 place cells occurs partly arousal works after connection, not simultaneously. The symmetry of excitatory and inhibitory connectivity matrices guarantees proper neural network dynamics, ensuring that attractors in space are not excited indefinitely. The activity change of hippocampal CA3 place cells caused by the inhibitory connection weight at time t is:
In formula (20), ψm,n is the inhibitory connection weight, which controls the global inhibition level, and its value is 0.00002. Since the activities of all cells at hippocampal CA3 sites were non-zero and normalized, in order to ensure that the firing rate of all place cells at all times was not less than zero, the firing rate of all place cells was compared with 0, and the results were normalized, the mathematical expression is as follows:
t and t+1 in formulas (21) and (22) represent the current moment and the next moment respectively. Through the modeling method of formula (16) to formula (21), the forward input from the entorhinal cortex can be represented on the cell plate in the form of excitatory activity packets. Then by obtaining the position of the exciting activity package on the cell plate, position of the current robot in space area encoded by the cell plate at the current position can be obtained, and the mathematical expression is as follow:
In formula (23), Pxt and Pyt represent the abscissa and ordinate of the excitatory activity packet on the place cell plate at time t, respectively. In order to make the model not limited to the spatial cognition in the encoding area, border cells with specific firing effects on the area boundary were introduced. Border cell firing stimulates a resetting of stripe cell firing activity when an encoded region boundary is reached, enabling rats to recognize position within arbitrarily sized spatial regions.
The specific implementation method is as follows: at the initial moment, the rat is set to be located in the center of the square area encoded by the place cell plate, and when the rat reaches any boundary of the given encoding area space, the path integration ∫vHDdt of all stripe cells in the direction of preferred angle ΦHD is set to zero, so that the rat is in the center of the positive direction area coded by the place cell plate after reset. In this way, every time the firing reset of stripe cells is completed, the place cell plate can immediately generate a code for a new spatial region, thereby completing the robot's position cognition for any size space.
The initial position of the robot movement is located in the center of the square area encoded by the place cell plate. The physical coordinate system is defined with the initial movement position as origin, and the horizontal direction of place cell plate is positive direction of X-axis. The physical coordinate systems mentioned below are all for this coordinate system. Then the mathematical expression of the position coordinates (Xenvt, Yenvt) of the robot in any size space area is as follows:
In formula (24), β is the proportional coefficient for transforming the coordinates on the place cell plate to the real position coordinates, and its value is the ratio of side length L of the square coding area to the side length Nx of the place cell plate.
QX and QY respectively represent the horizontal and vertical coordinates of the rat in any size space area when the place cell plate was reset last time. The position of the rat in any size of the space area can be obtained through the above calculation, which provides accurate position information for the construction of the subsequent cognitive node.
The purpose of constructing visual pathway calculation model is scenario cognition, that is, when the robot explores in the environment, it can first accurately identify the attributes of all objects in current field of view and simulate the function of the “what pathway”; then, for each identified object individually, calculate its orientation angle and distance information relative to the current robot, and simulate the function of the “where pathway”. The object detection algorithm in the present invention adopts the DPM algorithm with strong robustness. However, at this stage, most object position recognition algorithms are estimated directly by combining the depth map with the position of the recognized object in the RGB map, and this type of method has a large calculation error. To solve this problem, the present invention proposes a object position recognition algorithm. By rotating the robot, the object to be detected is placed in the center of the field of view, and then the distance between the object and the robot is obtained by using the depth camera.
In actual physical experiment, RGB image pixels collected by the robot are set to 1920*1080, and the pixel value in center field of view is pgraph_middle=1920/2. The rotation control of the robot is realized through the differential speed of the left and right wheels, that is, when the left and right wheels of the robot move in opposite directions at the same speed, the robot can rotate in place, and the rotation speed is set to ω.
When the robot explores in the environment, it will face a new scene every time it moves, and define i as the scene number. Firstly, the number of objects niobject in the i-th scene is identified by the DPM algorithm, and the current head-direction angle is Φ0i. The serial number of the currently detected object in the i-th scene is j, and the attribute of the j-th object to be detected is defined as ρij. Then calculate the orientation angle information of each object in turn: calculate the average value of the left and right boundaries of the j-th object to be detected obtained by the DPM algorithm in the image, and obtain the pixel position of the center of the object in the horizontal direction in the image, set it as pobject_middle. In order to place the object to be detected in the center field of view, rotation speed of the robot is controlled by the PID algorithm for closed-loop control.
The mathematical expression of the current pixel deviation eobject_middle is:
e
object_middle
=p
graph_middle
−p
object_middle (25)
Then the mathematical expression of the given value of current rotation speed ω obtained by the PID algorithm is:
In formula (26), kP, kI, kD respectively represent the proportional, integral, and differential coefficients of the PID controller, and the selection of their values is related to the actual physical environment and the hardware structure and configuration of the robot. When the object to be detected is placed in the center field of view, record the orientation angle Φ of the robot head at this time, then the direction angle of the j-th object in the i-th scene relative to the robot before rotation is Φij=Φ−Φ0i. At the same time, the depth camera is used to obtain the distance dij between the robot and the object, through the above operations, the orientation angle and distance information of the j-th object relative to the robot at the current moment can be obtained. After the information of all objects in the current scene is obtained, rotate the robot's head-direction angle to Φ0i, continue to explore and recognize in the environment. The acquisition of scenario information lays the foundation for the construction of subsequent cognitive maps.
Place cells in hippocampal CA1 area are neurons stimulated by angle, speed and visual information, and are the basic unit for constructing environmental cognitive maps. Therefore, a single place cell in hippocampal CA1 can be called as a cognitive node. A cognitive map consists of several cognitive nodes with topological relationships Cognitive nodes correspond to scenario information, and a new cognitive node will be established every time the robot moves.
The i-th cognitive node can be expressed by ei, which stores current scene information and pose information, and its mathematical expression is shown in formula (1). Wherein, Φ0i (Xenvi, Yenvi) (niobject, {ρij}, {Φij}, {dij}) represent the head-direction angle, position and scene information at the cognitive node, respectively. The head-direction angle and position were obtained from the entorhinal-hippocampal CA3 neural computing model; the scene information was obtained from the visual pathway computing model, and the position coordinates also represented the central coordinates of the firing field of the hippocampal CA1 place cells. There is also a connection between a single cognitive node and other cognitive nodes, and each cognitive node ei has a topological connection relationship with its upper and lower cognitive nodes (that is, there is a topological connection relationship between adjacent cognitive nodes). When the current scenario information output by the visual pathway matches the scenario information stored in the generated cognitive nodes, the connection between the current cognitive point and the matching cognitive point is established.
The steps for judging whether the scenario information of two cognitive nodes match are as follows: if there are two cognitive nodes ea and eb, first judge whether the number of objects in the two scenarios is the same and whether the attributes of the corresponding objects are consistent, if one of the above conditions is not satisfied, it is judged that the two scenarios do not match; otherwise, by measuring whether the orientation angle information of each object in the scenario is consistent, the mathematical expression of the measurement function S(ea, eb) is:
In formula (27), μΦ and μd represent the weights of direction information and distance information respectively, μ101 +μd=1, and the values of the two should be selected in combination with the actual physical scene and the units of angle and distance. Generally, when the angle is in radians and the distance is in meters, the value of μΦ is between 0.1-0.3, and the value of μd is between 0.7-0.9. Set the matching threshold as Sth, and select an appropriate value according to the actual situation. When the value of the metric function is less than the matching threshold, it is judged that the two scenes match, and at this time the topological relationship between cognitive nodes ea and eb is established; and vice versa.
In the process of continuous accumulation of cognitive nodes, their relative errors are also accumulated, resulting in a mismatch between the position of the robot itself and the current actual position. Therefore, it is necessary to use its topology to adjust the position of cognitive nodes. It is known that the current cognitive node is ei, and the cognitive node associated with it is ek. This represents that there is a topological relationship between node ei and node ek. Then the mathematical expression of the pose correction of cognitive nodes ei and ek is as follows.
Firstly, calculate the change amount of Δxik Δyik and ΔΦ0ik of the cognitive nodes, which is shown in formula (28).
In formula (28), Xenvi Yenvi and Xenvk Yenvk represent the horizontal and vertical coordinates of the place field's center corresponding to the cognitive points ei and ek respectively, dik represents the distance between the center of the place field corresponding to the cognitive point ei and ek, Φ0i and Φ0k respectively represents the head-direction angles at cognitive points ei and ek. After the change amount is obtained, the corrected node parameters can be iteratively calculated step by step according to the change amount, and the relevant mathematical expressions are shown in formula (29) and (30).
In formula (29) and (30), t and t+1 represent the time before and after each iterative operation, respectively, and δ represents the correction rate of the cumulative error, which is 0.5. In the actual cognitive map construction process, as the number of iterations increases, the value of the map update amount gradually decreases. At this time, the effect of iteratively updating the map is not significant and consumes the computing time of the processor, which affects the real-time performance of the algorithm. Based on this, this invention proposes a method for judging the convergence of cognitive maps, the specific steps are as follows. First, define the map convergence at time t as Δd(t), and its mathematical expression is shown in formula (31).
In formula (31), nsum represents the total number of current cognitive nodes, and ni represents the number of nodes associated with cognitive node i. Set the scale factor of the convergence criterion is σ, and the value is selected according to the actual situation, usually within the range of 0.0001-0.005. When Δd(t)−Δd(t+1)<σΔd(t+1), it is judged that there is no need to continue the map update iteration at this time; otherwise, continue to perform the update iteration of cognitive map construction. After obtaining the topological cognitive map of the environment and the scenario information, they can be fused to obtain the episodic cognitive map of the environment. The specific method is: according to the position of the robot in the physical coordinate system obtained above and the orientation angle and distance information of the object relative to the robot, the position of all objects in the physical coordinate system can be calculated, and each object is calculated according to the attribute and position information insert in the physical coordinate system including the topological map to obtain the episodic cognition map of the environmental expression.
Number | Date | Country | Kind |
---|---|---|---|
202110999152.7 | Aug 2021 | CN | national |
The present application is a continuation of the international application PCT/CN2022/114221 filed on Aug. 23, 2022, which claims the priority to the Chinese Patent Application No. 202110999152.7 filed on Aug. 28, 2021. The entire contents of the above identified applications are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2022/114221 | Aug 2022 | US |
Child | 18412459 | US |