This application claims the benefit and priority of European patent application number EP23182494.7, filed on Jun. 29, 2023. The entire disclosure of the above application is incorporated herein by reference.
This section provides background information related to the present disclosure which is not necessarily prior art.
The present disclosure relates to a method for determining spatial-temporal patterns related to an environment of a host vehicle from sequentially recorded data.
For autonomous driving and various advanced driver-assistance systems (ADAS), it is an important and challenging task to understand the traffic scene in the external environment surrounding a host vehicle. For example, based on sensor data acquired at the host vehicle, spatial and temporal correlations may be exploited for understanding the traffic scene, e.g. for tracking and/or anticipating the dynamics of any items present in the external environment.
For providing information regarding the environment of the host vehicle, machine learning algorithms including recurrent units may be applied to data provided by sequential sensor scans performed by sensors of a perception system of the host vehicle. Such machine learning algorithms may be able to derive spatial and temporal patterns which are related, for example, to moving objects in the environment of the host vehicle, via the data provided by the sequential sensor scans.
In detail, such sensor scans are performed for a predefined number of points in time, and sensor data may be aggregated over the previous points in time, i.e. before a current point in time, in order to generate a so-called memory state. The memory state is combined with data from a current sensor scan, i.e. for the current point in time, in order to derive the spatial and temporal pattern, i.e. based on correlations within the sequential sensor scans.
Machine learning algorithms including recurrent units, however, may only be able to implicitly derive dynamical properties of objects, e.g. from displacements of respective patterns. For example, a recurrent unit may correlate a first cluster of features in the memory state which indicates that an object is present at a previous point in time, with a second cluster of features in the novel or current data which indicate that an object is present at the current point in time.
Therefore, a model on which the machine learning algorithm relies may not be able to differentiate between distinct objects within a receptive field associated with the environment of the host vehicle. Moreover, the model may also not be able to incorporate legacy information regarding the dynamics of objects at a previous point in time in order to improve the estimation of dynamic properties, e.g. a velocity of a certain object, for the current point in time. Hence, such a model may also not be able to use the legacy information for an improved matching of objects by utilizing the dynamic properties for an improved identification of the respective position of objects for the next point in time or frame.
Accordingly, there is a need to have a method which is able to explicitly correlate sequentially recorded data detected in an external environment of a vehicle in order to provide reliable patterns for the environment.
This section provides a general summary of the disclosure, and is not a comprehensive disclosure of its full scope or all of its features.
The present disclosure provides a computer implemented method, a computer system and a non-transitory computer readable medium according to the independent claims. Embodiments are given in the subclaims, the description and the drawings.
In one aspect, the present disclosure is directed at a computer implemented method for determining patterns related to an environment of a host vehicle from sequentially recorded data. According to the method, respective sets of characteristics detected by a perception system of the host vehicle in the environment of the host vehicle are determined for a current point in time and for a predefined number of previous points in time. Via a processing unit of the host vehicle, a set of current input data associated with the sets of characteristics is generated for the current point in time, and a set of memory data is generated by aggregating the sets of characteristics for the previous points in time. An attention algorithm is applied to the set of current input data and to the set of memory data in order to generate a joined spatial-temporal data set, and at least one pattern is determined for the environment of the host vehicle from the joined spatial-temporal data set.
The perception system may include, for example, a radar system, a Lidar system and/or one or more cameras being installed at the host vehicle in order to monitor its external environment. Hence, the perception system may be able to monitor a dynamic context of the host vehicle including a plurality of objects which are able to move in the external environment of the host vehicle. The objects may include other vehicles and/or pedestrians, for example. The perception system may also be configured to monitor static objects, i.e. a static context of the host vehicle. Such static objects may include traffic signs or lane markings, for example.
The respective set of characteristics detected by the perception system may be acquired from the sequentially recorded data by using a constant sampling rate, for example. Therefore, the current and previous points in time may be defined in such a manner that there is a constant time interval therebetween. Accordingly, the predefined number of previous points in time may include an earliest point in time and a latest point in time which may immediately precede the current point in time.
The output of the method, i.e. the at least one pattern related to the environment of the host vehicle, may be provided as an abstract feature map which may be stored in a grid map. The grid map may be represented by a two-dimensional grid in bird's eye view with respect to the vehicle. However, other representations of the grid map may be realized alternatively. The grid map may include a predefined number of cells. With each cell, a predefined number of features may be associated in order to generate the feature map.
Further tasks being implemented e.g. as further respective machine learning algorithms may be applied, e.g. comprising a decoding procedure, to the feature map including the at least one pattern related to the environment of the host vehicle. These tasks may include different kinds of information regarding the dynamics of a respective object, e.g. regarding its position, its velocity and/or regarding a bounding box surrounding the respective object. That is, the objects themselves, i.e. their positions, and their dynamic properties may be detected and/or tracked by applying a respective task to the feature map including the pattern. Moreover, a grid segmentation may be performed as a task being applied to the feature map, e.g. in order to detect a free space in the environment of the host vehicle.
The method may be implemented as one or more machine learning algorithms, e.g. as neural networks for which suitable training procedures are defined. When training the machine learning algorithms or neural networks, the output of the method and a ground truth may be provided to a loss function for optimizing the respective machine learning algorithm or neural network. The ground truth may be generated for a known environment of the host vehicle for which sensor data provided by the perception system may be preprocessed in order to generate data e.g. associated with a grid map in bird's eye view.
This data may be processed by the method, and the respective result of the further tasks, e.g. regarding object detection and/or segmentation for determining a free space, may be related to the known environment of the host vehicle. The loss function may acquire an error of a model with respect to the ground truth, i.e. the model on which the machine learning algorithm or neural network relies on. Weights of the machine learning algorithms or neural networks may be updated accordingly for minimizing the loss function, i.e. the error of the model.
The method according to the disclosure differs from previous approaches which apply e.g. recurrent units in that the attention algorithm is applied in order to explicitly exploit correlations between the set of memory data being aggregated up to the latest previous point in time and the set of current input data. These correlations are represented by the joint spatial-temporal data set. By using the attention algorithm, the method is able to explicitly correlate patterns, e.g. for the dynamics of objects, for consecutive frames or consecutive points in time.
Due to this, the method is able to differentiate between distinct objects within the external environment of the host vehicle, i.e. within a receptive field of the method, and to provide or track their dynamic properties uniquely for the points in time under consideration. Moreover, legacy information from the memory data, e.g. regarding the dynamics of the different objects, is considered via the attention algorithm within the joint spatial-temporal data set in order to derive the respective pattern therefrom. For example, an explicit spatial displacement may be derived from the feature map, and the spatial displacement may be converted into an improved estimation of the respective velocity of the objects and an improved estimation of the object's position in the next frame or for the next point in time in order to improve the matching of objects between frames or time steps.
According to an embodiment, the set of current input data and the set of memory data may be associated with respective grid maps defined for the environment of the host vehicle. The association with the grid maps may facilitate the representation of the respective patterns related to the plurality of objects. For example, the grid maps may include a plurality of grid cells, each comprising different channels for each of the respective characteristics of the environment of the host vehicle.
When applying the attention algorithm, a matching may be determined between a cell of the grid map associated with the set of current input data and a plurality of cells of the grid map associated with the set of memory data. Alternatively, the matching may be determined between one of the cells of the grid map associated with the set of current input data and a plurality of cells of the grid map associated with the set of memory data and, in addition, with the set of current input data. In other words, the current input data may also be considered for the plurality of cells of the grid map which is matched with one of the cells of the grid map associated with the set of current input data. The attention algorithm may therefore provide a relationship between a cell of the grid map under consideration at the current point in time and a “dynamic history” leading to the status of the current input data, i.e. for the current point in time.
According to a further embodiment, the attention algorithm may include weights which are defined by relating elements of the set of current input data and assigned elements of the set of memory data. The weights of the attention algorithm may be applied to values generated by employing a union of the elements of the set of current input data and the assigned elements of the set of memory data in order to provide the joint spatial-temporal data set as an output of the attention algorithm.
For example, the elements of the set of current input data may refer to one or more grid cells related to a certain object under consideration, and the assigned elements of the set of memory data which are to be related to the elements of the set of current input data may be associated with the previous dynamics of this object which is represented by more than one of the grid cells. By defining or generating the weights of the attention algorithm by the relationship of the current input data and the memory data, and by applying these weights to values generated based on a portion of the current input data and the memory data, legacy information regarding the dynamics is transferred from the memory data when generating the joint spatial-temporal data set. Therefore, the reliability of the determined patterns is improved via the attention algorithm.
Relating the elements of the set of current input data and the assigned elements of the set of memory data may include: generating a query vector by employing the elements of the set of current input data, generating a key vector by employing the assigned elements of the set of memory data and the elements of the set of current input data, and estimating a dot product of the query vector and the key vector from which the weights of the attention algorithm are estimated. That is, via the dot product a matching for agreement between the query vector and the key vector is estimated which represents the correlation therebetween and defines the weights for the attention algorithm.
The key vector may be generated based on a concatenation of the assigned elements of the set of memory data and the elements of the set of current input data.
For generating the joint spatial-temporal data set, the output of the attention algorithm may further be combined with the set of memory data. A gating procedure may be configured to define weights for combining respective channels within the set of memory data with the output of the attention algorithm. The gating procedures may therefore adjust for each channel within the set of memory data how the memory state and the output of the attention algorithm are to be combined.
According to a further embodiment, information regarding respective distances with respect to a spatial reference point may be associated with the set of current input data and with the set of memory data in order to track movements of objects between one of the previous points in time and the current point in time. The spatial reference point may be defined at the host vehicle. The information regarding respective distances may be defined on a mesh grid of distances, and/or existing grid maps may be enriched by the information regarding the respective distances. By tracking the movements of the objects, a velocity estimation may be performed.
The information regarding the respective distances associated with the set of memory data may be determined via a motion model which includes a velocity estimation for the objects. For example, a velocity estimation may be performed for the objects for one of the previous points in time and transferred to the current point in time via the motion model.
Moreover, positional information may be associated with elements of the set of current input data and assigned elements of the set of memory data in order to estimate a velocity of an object in the environment of the host vehicle. The velocity estimation may be performed via an estimation of displacement between the current point in time and one of the previous points in time. The positional information may be provided as positional encodings which may be concatenated to the input data which may be used thereafter to calculate the query, the key and the value as described above
The at least one pattern may be provided to an algorithm for object detection which may determine at least one of a position, a velocity and coordinates of the bounding box associated with at least one of a plurality of objects located in the environment of the host vehicle, and/or to an algorithm for segmenting the environment of the host vehicle. The segmenting of the environment may include determining a free space e.g. based on a grid segmentation. Based on this information regarding the respective object, a tracking of the object may be performed, for example.
In another aspect, the present disclosure is directed at a computer system, said computer system being configured to carry out several or all steps of the computer implemented method described herein. The computer system may be further configured to receive respective sets of characteristics detected by a perception system of a host vehicle in an environment of the host vehicle for a current point in time and for a predefined number of points in time before the current point in time.
The computer system may comprise a processing unit, at least one memory unit and at least one non-transitory data storage. The non-transitory data storage and/or the memory unit may comprise a computer program for instructing the computer to perform several or all steps or aspects of the computer implemented method described herein.
As used herein, terms like processing unit and module may refer to, be part of, or include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a combinational logic circuit, a field programmable gate array (FPGA), a processor (shared, dedicated, or group) that executes code, other suitable components that provide the described functionality, or a combination of some or all of the above, such as in a system-on-chip. The processing unit may include memory (shared, dedicated, or group) that stores code executed by the processor.
In another aspect, the present disclosure is directed at a vehicle including a perception system and the computer system as described herein. The vehicle may be an automotive vehicle, for example.
According to an embodiment, the vehicle may further include a control system being configured to receive information derived from the at least one pattern provided by the computer system and to apply the information for controlling the vehicle. The information derived from the at least one pattern may include properties of objects detected in the environment of the host vehicle and/or a segmentation of the environment including a free space.
In another aspect, the present disclosure is directed at a non-transitory computer readable medium comprising instructions for carrying out several or all steps or aspects of the computer implemented method described herein. The computer readable medium may be configured as: an optical medium, such as a compact disc (CD) or a digital versatile disk (DVD); a magnetic medium, such as a hard disk drive (HDD); a solid state drive (SSD); a read only memory (ROM); a flash memory; or the like. Furthermore, the computer readable medium may be configured as a data storage that is accessible via a data connection, such as an internet connection. The computer readable medium may, for example, be an online data repository or a cloud storage.
The present disclosure is also directed at a computer program for instructing a computer to perform several or all steps or aspects of the computer implemented method described herein.
Further areas of applicability will become apparent from the description provided herein. The description and specific examples in this summary are intended for purposes of illustration only and are not intended to limit the scope of the present disclosure.
The drawings described herein are for illustrative purposes only of selected embodiments and not all possible implementations, and are not intended to limit the scope of the present disclosure.
Exemplary embodiments and functions of the present disclosure are described herein in conjunction with the following drawings, showing schematically:
Corresponding reference numerals indicate corresponding parts throughout the several views of the drawings.
Example embodiments will now be described more fully with reference to the accompanying drawings.
The perception system 110 may include a radar system, a LIDAR system and/or one or more cameras in order to monitor the external environment or surroundings of the vehicle 100. Therefore, the perception system 110 is configured to monitor a dynamic context 125 of the vehicle 100 which includes a plurality of objects 130 which are able to move in the external environment of the vehicle 100. The objects 130 may include other vehicles 140 and/or pedestrians 150, for example.
The perception system 110 is also configured to monitor a static context 160 of the vehicle 100. The static context 160 may include static objects 130 like traffic signs 170 and lane markings 180, for example.
The perception system 110 is configured to determine characteristics of the objects 130. The characteristics include a current position, a current velocity and an object class of each road user 130 for a plurality of points in time. The current position and the current velocity are determined by the perception system 110 with respect to the vehicle 100, i.e. with respect to a coordinate system having its origin e.g. at the center of mass of the vehicle 100, its x-axis along a longitudinal direction of the vehicle 100 and its y-axis along a lateral direction of the vehicle 100. Moreover, the perception system 100 determines the characteristics of the road users 130 for a predetermined number of previous points in time and for a current point in time, e.g. for each 0.5 s.
The computer system 120 transfers information derived from the result or output 260 (see
The characteristics for the current and the previous points in time are transferred to the processing unit 121 which generates a set of current input data 210 associated with the sets of characteristics for the current point in time, and a set of memory data 220 by aggregating the sets of characteristics over the predefined number of previous points in time. To the set of current input data 210 and to the set of memory data 220, an attention algorithm 230 is applied via the processing unit 121 in order to generate a joint spatial-temporal data set.
An output 360 of the attention algorithm 230 is subjected to a first residual block 240 and to a second residual block 250 in order to determine an output 260 which includes at least one pattern from the joint spatial-temporal data set. Details for the different blocks 210 to 250 as shown in
The output 260, i.e. the at least one pattern related to the environment of the host vehicle 100, is provided as an abstract feature map which is stored in a grid map. This grid map is generated in a similar manner as described below for respective grid maps generated for associating the set of current input data 210 and the set of memory data 220 therewith (see also
The respective grid maps for the output 260, for the set of current input data 210 and for the set of memory data 220 are represented by a two-dimensional grid in bird's eye view with respect to the host vehicle 100. However, other representations of the grid maps may be realized alternatively. The grid maps include a predefined number of cells. For the output 260, a predefined number of features is assigned to each cell in order to generate the feature map.
The output 260, i.e. the feature map described above, is transferred to a task module 270 which applies further tasks to the feature map including the at least one pattern related to the environment of the host vehicle 100. These tasks include tasks related to an object detection and/or to a segmentation of the environment of the host vehicle 100, i.e. to a segmentation of the grid map associated with the output 260.
The object detection provides different kinds of information regarding the dynamics of a respective object 130, e.g. regarding its position, its velocity and/or regarding a bounding box surrounding the respective object 130. That is, the objects 130 themselves, i.e. their positions, and their dynamic properties are detected and/or tracked by applying a respective task of the task module 270 to the feature map including the pattern. Moreover, the grid segmentation is applied e.g. in order to detect a free space in the environment of the host vehicle 100.
The results of the task module 270 as described above are provided to the control system 124 in order to use these results, e.g. the properties of the objects 130 and/or the free space, as information for controlling the host vehicle 100.
The attention algorithm 230, the first residual block 240 and the second residual block 250 are implemented as respective machine learning algorithms, e.g. as a respective neural network for which suitable training procedures are defined. The task module 270 including the further tasks is also implemented as one or more machine learning algorithms, e.g. comprising a respective decoding procedure, being associated with the respective task. The task module 270 includes a respective head for each required task.
When training the machine learning algorithms or neural networks, the output 260 and a ground truth are provided to a loss function for optimizing the neural network. The ground truth is generated for a known environment of the host vehicle 100 for which sensor data provided by the perception system is preprocessed in order to generate data associated with a grid map e.g. in bird's eye view. This data is processed by the method, and the respective result of the different heads of the task module 270, i.e. regarding object detection and/or segmentation for e.g. determining a free space, is related to the known environment of the host vehicle 100. The loss function acquires the error of a model, i.e. the model on which the machine learning algorithm or neural network relies, with respect to the ground truth. Weights of the machine learning algorithms or neural networks are updated accordingly for minimizing the loss function, i.e. the error of the model.
With respect to
For each pixel or cell of the respective grid map or image, a respective channel is associated with one of the characteristics or features of the object 130. Hence, the empty multi-channel image mentioned above and representing the rasterized region of interest close to the vehicle 100 is filled by the characteristics of the objects 130 which are associated with the respective channel of the pixel or grid cell.
The information provided by the characteristics of the object 130 for the different points in time is aggregated e.g. at a cell 310 under consideration due to spatial-temporal correlations. The cell 310 is located at an imaginary vehicle 330 for the current point in time. In order to derive patterns regarding objects 130, e.g. for the imaginary vehicle 330, a matching is calculated between the information provided at the cell 310 and information provided by a plurality of cells 320 to which the memory data 220 is associated and which are relevant for the aggregation of information provided at the cell 310 under consideration within the set of current input data 210.
Generally speaking, a subset XI of the set of current input data 210 represented by It is considered:
which comprises the cell 310, for the example, as shown in
In the example as shown in
Within the attention algorithm 230, the matching of information is calculated between the respective cells of the set of current input data 210, e.g. for the cell 310, and a union XI∪XH of the subset of the current input data 210 and the subset of the memory data 220. In detail, the respective cells, like the cell 310 under consideration, within the set of current input data 210 are considered for determining a query Q within the attention algorithm 230, and for determining keys K as well as values V of the attention algorithm 230 as indicated by the dashed lines 340 in
In contrast, respective subsets of the set of memory data 220, e.g. as provided by the cells 320 shown in
In detail, the query Q is calculated by linearly projecting XI as
wherein θQ denotes trainable parameters of matrices utilized for the linear projections.
Similarly, the keys K are provided by a second linear projection:
wherein θK denotes trainable parameters of matrices used for the linear projections, and [XI, XH] defines a concatenation of the two subsets or feature maps XI, XH along a so-called token dimension, i.e. along the columns and rows of the grid maps as shown in
In order to calculate the matching between XI and the union XI∪XH, a dot product of the query Q and the keys K is used which is subjected to a softmax function as follows:
wherein the term dk denotes the length of the key and query vectors Q, K per token, and WAtt denotes so-called attention weights which have the following properties:
It is noted that NF denotes the number of cells or tokens comprised by the subset XH obtained from the receptive field within HT-1, e.g. the information provided by the cells 320 as shown in
Moreover, the values are also defined by a linear projection as follows:
wherein θV denotes trainable parameters of the respective matrices used for the linear projection, and [XI, XH] again denotes the concatenation of the two subsets or feature maps XI and XH along the token dimension.
Finally, the output 360 of the attention algorithm 230 is provided as a weighted sum of the attention weights and the values as follows:
In summary, the attention algorithm 230 aggregates information from the union provided by the set of current input data 210 and the set of memory data 220 to a target location, e.g. the cells 310 under consideration as shown in
The first residual block 240 combines the output 360 of the attention algorithm 230 again with the memory state Ht-1 which is represented by the set of memory data 220. Hence, when regarded on a larger scale, the first and second residual blocks 240, 250 provide a recurrent functionality similar to known recurrent units.
In detail, the first residual block 240 includes a channel gating module 370 which defines a weighted sum for the combination of the output 360 of the attention algorithm 230 and the memory state Ht-1 220. The weights defined by the channel gating module 370 are given as follows:
wherein σ denotes sigmoid activation function and θW_Gate represents trainable parameters of the channel gating module 370. OutAtt is the output 360 of the attention algorithm 230.
Without the channel gating module 370, the method would merely perform a weighting across the value vector V as a whole. Known recurrent units, however, allow gating channels of a memory state like Ht-1 individually.
In order to achieve a similar degree of freedom regarding feature propagation, the channel gating module 370 uses the weighted sum as mentioned above and depicted in detail in the enlarged subdiagram of the channel gating module 370 in
The output of the channel gating module 370 is subjected to a layer norm 375. By the layer norm 375, the gated features provided as an output of the channel gating module 370 are given within a uniform value range for the feature maps which reduces a covariant shift.
The second residual block 250 includes a feedforward block 380 and a further layer norm 385. The feedforward block 380 includes two-dimensional convolutional network with kernel sizes of 1×1 such that:
OutRes1 represents the output of the first residual block 240, while θFF1 and θFF2 denote trainable parameters for the first and second convolutional network within the feedforward module 380, respectively. ReLU denotes a rectified linear unit activation function. OutRes2 denotes the output of the feedforward module 380 which is combined with the output of the first residual module 240 at 381. The combination of the output of the first residual module 240 and the feedforward module 380 generated at 381 is subjected to the further layer norm 385 in order to provide the output 260 of the entire method according to the disclosure.
The method as described above generates a joint spatial-temporal data set including spatial-temporal correlations between the set of current input data 210 and the set of memory data 220 which are exploited to determine at least one pattern related to the plurality of objects 130 detected in the environment of the host vehicle 100. In order to achieve an “increased expressiveness” regarding the dynamics of the traffic scene surrounding the host vehicle 100, i.e. to enhance object tracking, for example, additional information may be incorporated into the method as is depicted in
For this purpose, respective grids 410, 420 associated with the set of current input data 210 and the set of memory data 220, respectively, further contain distances of each grid cell with respect to a spatial reference position located at a rear bumper of the host vehicle 100 (see
Simultaneously, velocities vt-1 in longitudinal and lateral directions with respect to the grid and for the preceding point in time t−1 are predicted for the detected objects 130 (see
wherein fR is a constant frequency, e.g. 20 Hz, indicating a frame rate under which sensor scans are performed by the perception system 110 (see
By the projected target location or coordinates pos′, the keys K of the union XI∪XH as described above are extended or enriched in a similar manner as the query Q and the key K for the set of current input data 210. The enrichment of the keys K regarding the set of memory data represented by Ht-1 is indicated by 440 in
By this means, the model underlying a method according to the disclosure is able to incorporate prior assumptions on the location in which an object 130 moving for the current point in time t−1 is contained by a specific grid cell at the current point in time t. Before concatenating the respective tokens for calculating the query Q and the key vectors K, posabs and pos′ are scaled by a constant factor such that they include values in the range of [−1, 1] in order to maintain stable gradient during training.
In order to provide predictions regarding velocities of objects 130, a “positional awareness” is introduced for the values V as indicated by 450 in
By concatenating the input tokens to the value calculation and poSrel, the above-mentioned awareness of direction and distances is achieved. As a result, the displacement of an object 130 moving e.g. from an arbitrary position to the central position of the “sliding window” as shown in
At 502, respective sets of characteristics detected by a perception system of a host vehicle in an environment of the host vehicle may be determined for a current point in time and for a predefined number of previous points in time. At 504, a set of current input data associated with the sets of characteristics may be generated for the current point in time. At 506, a set of memory data may be generated by aggregating the sets of characteristics for the previous points in time. At 508, an attention algorithm may be applied to the set of current input data and to the set of memory data in order to generate a joined spatial-temporal data set. At 510, at least one pattern may be determined for the environment of the host vehicle from the joined spatial-temporal data set.
According to various embodiments, the set of current input data and the set of memory data may be associated with respective grid maps defined for the environment of the vehicle.
According to various embodiments, when applying the attention algorithm, a matching may be determined between a cell of the grid map associated with the set of current input data and a plurality of cells of the grid map associated with the set of memory data.
According to various embodiments, the attention algorithm may include weights which may be defined by relating elements of the set of current input data and assigned elements of the set of memory data, and the weights of the attention algorithm may be applied to values generated by employing a union of the elements of the set of current input data and the assigned elements of the set of memory data in order to provide the joined spatial-temporal data set as an output of the attention algorithm.
According to various embodiments, relating the elements of the set of current input data and the assigned elements of the set of memory data may include: generating a query vector by employing the elements of the set of current input data, generating a key vector by employing the assigned elements of the set of memory data and the elements of the set of current input data, and estimating a dot product of the query vector and the key vector from which the weights of the attention algorithm are estimated.
According to various embodiments, the key vector may be generated based on a concatenation of the assigned elements of the set of memory data and the elements of the set of current input data.
According to various embodiments, for generating the joined spatial-temporal data set, the output of the attention algorithm may be further combined with the set of memory data, and a gating procedure may be configured to define weights for combining respective channels within the set of memory data with the output of the attention algorithm.
According to various embodiments, information regarding respective distances with respect to a spatial reference point may be associated with the set of current input data and with the set of memory data in order to track movements of objects between one of the previous points in time and the current point in time.
According to various embodiments, the information regarding the respective distances associated with the set of memory data may be determined via a motion model which may include a velocity estimation for the objects.
According to various embodiments, positional information may be associated with elements of the set of current input data and assigned elements of the set of memory data in order to estimate a velocity of an object in the environment of the host vehicle.
According to various embodiments, the at least one pattern may be provided to an algorithm for object detection which determines at least one of a position, a velocity and coordinates of a bounding box associated with at least one of a plurality of objects, located in the environment of the host vehicle, and/or to an algorithm for segmenting the environment of the host vehicle.
Each of the steps 502, 504, 506, 508, 510 and the further steps described above may be performed by computer hardware components.
The characteristics determination circuit 602 may be configured to determine respective sets of characteristics detected by a perception system of a host vehicle in an environment of the host vehicle for a current point in time and for a predefined number of previous points in time.
The current input data generation circuit 604 may be configured to generate a set of current input data associated with the sets of characteristics for the current point in time.
The memory data generation circuit 606 may be configured to generate a set of memory data by aggregating the sets of characteristics for the previous points in time.
The attention algorithm circuit 608 may be configured to apply an attention algorithm to the set of current input data and to the set of memory data in order to generate a joined spatial-temporal data set.
The pattern determination circuit 610 may be configured to determine at least one pattern for the environment of the host vehicle from the joined spatial-temporal data set.
The characteristics determination circuit 602, the current input data generation circuit 604, the memory data generation circuit 606, the attention algorithm circuit 608 and pattern determination circuit 610 may be coupled to each other, e.g. via an electrical connection 611, such as e.g. a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals.
A “circuit” may be understood as any kind of a logic implementing entity, which may be special purpose circuitry or a processor executing a program stored in a memory, firmware, or any combination thereof.
The processor 702 may carry out instructions provided in the memory 704. The non-transitory data storage 706 may store a computer program, including the instructions that may be transferred to the memory 704 and then executed by the processor 702.
The processor 702, the memory 704, and the non-transitory data storage 706 may be coupled with each other, e.g. via an electrical connection 708, such as e.g. a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals.
As such, the processor 702, the memory 704 and the non-transitory data storage 706 may represent the characteristics determination circuit 602, the current input data generation circuit 604, the memory data generation circuit 606, the attention algorithm circuit 608 and the pattern determination circuit 610, as described above.
The terms “coupling” or “connection” are intended to include a direct “coupling” (for example via a physical link) or direct “connection” as well as an indirect “coupling” or indirect “connection” (for example via a logical link), respectively.
It will be understood that what has been described for one of the methods above may analogously hold true for the pattern determination system 600 and/or for the computer system 700.
Number | Date | Country | Kind |
---|---|---|---|
23182494.7 | Jun 2023 | EP | regional |