The present disclosure relates to a computer-implemented method and an apparatus for predicting virtual road sign locations where virtual road signs may be superimposed onto environmental data for display in, for example, a navigation system of a vehicle.
In augmented reality navigation systems data of the physical environment of a vehicle is typically overlaid with information from a digital road database stored in the navigation system. The physical environment of the vehicle is usually captured as scene images by a forward-facing camera that is arranged at the vehicle, the scene images being output as environmental data to the navigation system. On the display of the navigation system the driver then sees the scene images superimposed with additional, augmenting information/content such as virtual road signs, maneuver prompts, or other navigation instructions.
However, especially with complicated intersections it is often difficult to accurately place the augmenting information in relation to the displayed scene image. Inconsistencies might occur between the location of the augmenting information and the displayed scene image.
The present disclosure relates to a computer-implemented method for predicting virtual road sign locations. The method comprises the following steps: collecting, as a first training data subset, one or more aerial and/or satellite images of a pre-determined region; obtaining, as a second training data subset, geocentric positions of key point markers in the pre-determined region; supplying the first training data subset and the second training data subset to a deep neural network as training dataset; training the deep neural network on the training dataset to predict key point marker locations in a region of interest, the key point marker locations corresponding to virtual road sign locations; defining a region of interest as input dataset; and processing the input dataset by the trained deep neural network to predict key point marker locations within the defined region of interest, the key point marker locations corresponding to virtual road sign locations.
The steps of the method may be performed in the mentioned order. The predicted key point marker locations may be used for superimposing onto environmental data (e.g., scene images) displayed to a driver of a vehicle, the environmental data being output by a forward-facing camera of the vehicle. The predicted key point marker locations may be stored in a database. In this way a database of key point marker locations may be obtained that may be updated periodically by periodic execution of the method. The database may, for example, be stored in a vehicle’s on-board navigation system such that an augmented navigation application can use the predicted key point marker locations to superimpose virtual road signs onto a displayed scene image to assist in driving maneuvers. The database of predicted key point marker locations may be used to superimpose key point markers in the form of virtual road signs onto a standard definition (SD) map, thereby avoiding the use of larger high definition (HD) maps that use more memory space.
The aerial and satellite images may be map tiles of earth images, in particular map tiles containing road infrastructures such as, e.g., intersections. The key points may, for example, include turn points and/or line-change locations/signs.
The method comprises a training phase and an inference phase. The training phase includes the steps of collecting the first training data subset, obtaining the second training data subset, supplying the first training data subset and the second training data subset as training dataset to a deep neural network, and training the deep neural network on the training dataset to predict key point marker locations in a region of interest. The inference phase includes the steps of defining a region of interest as input dataset, and processing the input dataset by the trained deep neural network to predict key point marker locations within the defined region of interest. The inference phase may further include the step of storing the key point marker locations in a database.
With the second training data subset, e.g., the geocentric positions of the key point markers in the pre-determined region, the first training data subset, e.g., the aerial and/or satellite images of the pre-determined region may be labeled (also called marked-up), wherein the labels are the geocentric positions/locations of the key point markers. For example, if the key points are turn points, the labels are the geocentric positions, e.g., the coordinates, in particular the degrees of longitude and latitude, of the turn points within the entire set or a subset of the intersections and crossroads in the aerial and/or satellite images.
The geocentric positions of the key point markers may be obtained, for example, through user input, through one or more crowdsourcing platforms, and/or through provision of established geocentric positions of key point markers in the pre-determined region. This list of options for obtaining geocentric positions of key point markers shall not be exhaustive. In case the geocentric positions of the key point markers are obtained through user input, people/users may be asked to enter labels indicating geocentric positions of key point markers in the pre-determined region into a specifically designed computer system that may be configured to display aerial and/or satellite images of pre-determined regions. In case the geocentric positions of the key point markers are obtained by provision of established geocentric positions of key point markers in the pre-determined region the established geocentric positions of the key point markers may be bought from a provider already having the sought after geocentric positions of the key point markers.
The deep neural network may be a convolutional neural network. During the training of the deep neural network, the weights of the deep neural network are set such that the deep neural network starts predicting, for the region of interest being the pre-determined region used during training, key point marker locations as close as possible to the locations of the key point markers included in the second training data subset.
The deep neural network may predict the key point marker locations, e.g., the virtual road sign locations, such that, for an intersection, the key point markers are located at the center of each road or lane entering the intersection. Intersections may comprise crossroads, T junctions and similar. The deep neural network may also predict the key point marker location such that the key point markers, e.g., the virtual road signs, have superior visibility, e.g., are not occluded by environmental objects such as buildings or the like.
The method of the present disclosure, e.g., its training phase and also its inference phase, may be performed offline, e.g., not in real-time but in an offline modus. Specifically designed servers with appropriate computational resources may be used. In offline processing the region of interest that serves as input data for the deep neural network in the inference phase may be defined in advance. In case of offline processing the predicted key point marker locations may be stored in a database for further distribution to mobile devices such as smart phones and vehicle navigation systems, where virtual road signs are superimposed at the predicted key point marker locations onto the scene images captured, e.g., by a forward-facing camera of the vehicle. If desired a coordinate transformation may be performed on the predicted key point marker locations such that the coordinate system used for the key point marker locations is transformed into the coordinate system used for the pixels of the scene images.
In the offline modus a feedback/validation mechanism may be provided to ensure that the trained deep neural network properly predicts the key point marker locations. A separate, second neural network may be provided to which the aerial and/or satellite images of the pre-determined region that were used as first training data subset are supplied as input data for validation. The second neural network analyses the validation input data and detects intersections in the pre-determined region/the first training data subset. It is then further checked by comparison whether the key point marker locations predicted by the trained deep neural network coincide with the detected intersections or not. A tolerance range may be provided allowing for some distance between the predicted key point marker locations and the detected intersections. If the predicted key point marker locations do not coincide with the detected intersections, then the pre-determined region concerned is marked for manual labeling (e.g., by placing it in a corresponding queue), e.g., for manually assigning one or more key point marker locations to the intersections concerned. After manual labeling the pre-determined region may be used the next time the trained deep neural network is applied to the pre-determined region.
Alternatively, the inference phase of the method may be performed online (online modus), for example on a mobile device such as a smart phone or a navigation system used in a vehicle, as the mobile device travels along the route together with the vehicle. In this case the regions of interest may be defined in real-time, for example by the driver. E.g., the input dataset to the trained deep neural network is supplied to and processed by the trained deep neural network in real-time. In case of online processing the predicted key point markers may be immediately used in that virtual road signs are superimposed in real-time on the scene images captured by a forward-facing camera of the vehicle at the predicted key point marker locations. The predicted key point markers are selected for superimposing based on the current position of the vehicle and route information such that key point markers relevant to the current route are selected. Again, a coordinate transformation may be performed on the predicted key point marker locations such that the coordinate system used for the key point marker locations is transformed into the coordinate system used for the pixels of the scene images, if desired.
In the online modus, if a key point such as a turn point or a line change possibility are displayed by the navigation system of a vehicle, but there is no virtual road sign superimposed, or a displayed virtual road sign is placed unacceptable far away from the key point, this may be detected by a feedback/validation mechanism of the method (or by user input). In this case the location, e.g., the coordinates, of the key point and the location, e.g., the coordinates, of the misplaced virtual road sign/predicted key point marker (if there is any) may be uploaded together with a tolerance range to a server or similar for further analysis. If an error is found in the database of predicted key point marker locations during the analysis, the missing key point marker will be placed manually, e.g., its location will be chosen manually, and used the next time the trained deep neural network will be used for the same region of interest.
The present disclosure further relates to an apparatus for predicting key point marker locations that shall be superimposed onto environmental data of a vehicle, wherein the apparatus comprises means for performing the method of the present disclosure. For example, the apparatus comprises a processor and a memory that may be employed for the training phase and the inference phase of the deep neural network. The trained deep neural network and/or the predicted key point marker locations, e.g., the predicted road sign locations, that are generated by the deep neural network may be stored in the memory.
The method of the present disclosure exploits the fact that aerial and satellite images of the earth include road infrastructures which contain information such as, e.g., intersections, that can be used to define key point marker locations. With the method key point markers/virtual road signs that may be accompanied by additional augmenting content (e.g., the name of a diverting street at an intersection) may be placed properly in relation to their corresponding key points, e.g, intersections, in a displayed scene image so that a driver of a vehicle, navigation applications and/or autonomous path planning systems can effectively execute driving maneuvers.
The present disclosure may be applied to so-called augmented navigation systems as used in vehicles but is not limited to this particular application. The present disclosure may, for example, be applied to any computer system that uses a display such as a computer screen or other means of visualization where navigation instructions such as virtual road sign shall be superimposed onto real-world images taken, for example, by a forward-facing camera.
Embodiments are described by way of example, with reference to the accompanying drawings, which are not drawn to scale, in which like reference numerals refer to similar elements.
The turn point markers 106, 206 shown in
The key point marker locations may also be predicted such by the trained deep neural network that the key point markers are positioned on a curved path connecting two adjacent potential key point marker locations at road/lane centers. In this case, the key point marker locations (e.g., the visual road sign locations) may be chosen such on the curved path that the key point markers/virtual road signs are more visually appealing/better visible/better discernable to a driver, for example, not occluded by a building but instead placed before the building. An example is shown in
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/RU2020/000402 | 7/31/2020 | WO |