This disclosure relates generally to neuro-symbolic computing, or knowledge-infused learning, for entity prediction.
In general, the field of autonomous driving typically involves processing a multitude of data streams from an array of sensors. These data streams are then used to detect, recognize, and track objects in a scene. For example, in computer vision, a scene is often represented as a set of labeled bounding boxes, which are provided around the objects that are detected within a frame. However, scenes are often more complex than just a set of recognized objects. While machine learning techniques have been able to perform these object recognition tasks, they tend to lack the ability to fully utilize the interdependence of entities and semantic relations within a scene. These machine learning techniques, when taken alone, are not configured to provide high-level scene understanding, which is accurate and complete.
The following is a summary of certain embodiments described in detail below. The described aspects are presented merely to provide the reader with a brief summary of these certain embodiments and the description of these aspects is not intended to limit the scope of this disclosure. Indeed, this disclosure may encompass a variety of aspects that may not be explicitly set forth below.
According to at least one aspect, a computer implemented method for knowledge-based entity prediction is disclosed. The method includes obtaining a knowledge graph based on (i) labels associated with a scene and (ii) an ontology. The knowledge graph includes nodes and edges. A set of the nodes represent the labels associated with the scene. Each edge represents a relation between related pairs of nodes. The method includes identifying a path with multiple edges having multiple relations from a source node to a target node via at least one intermediary node between the source node and the target node. The method includes reifying the path by generating a reified relation to represent the multiple relations of the path in which the reified relation is represented as a new edge that directly connects the source node to the target node. The method includes generating a reified knowledge graph structure based on the knowledge graph. The reified knowledge graph structure includes at least the source node, the target node, and the new edge. The method includes training a machine learning system to learn a latent space defined by the reified knowledge graph structure.
According to at least one aspect, a data processing system comprises one or more non-transitory computer readable storage media and one or more processors. The one or more non-transitory computer readable storage media store computer readable data including instructions that are executable to perform a method. The one or more processors are in data communication with the one or more non-transitory computer readable storage media. The one or more processors are configured to execute the computer readable data and perform the method. The method includes obtaining a knowledge graph based on (i) labels associated with a scene and (ii) an ontology. The knowledge graph includes nodes and edges. A set of the nodes represent the labels associated with the scene. Each edge represents a relation between related pairs of nodes. The method includes identifying a path with multiple edges having multiple relations from a source node to a target node via at least one intermediary node between the source node and the target node. The method includes reifying the path by generating a reified relation to represent the multiple relations of the path in which the reified relation is represented as a new edge that directly connects the source node to the target node. The method includes generating a reified knowledge graph structure based on the knowledge graph. The reified knowledge graph structure includes at least the source node, the target node, and the new edge. The method includes training a machine learning system to learn a latent space defined by the reified knowledge graph structure.
According to at least one aspect, a computer-implemented method includes obtaining a knowledge graph with data structures that include at least a first triple and a second triple. The first triple includes a first scene instance, a first relation, and a first entity instance. The first relation relates the first scene instance to the first entity instance. The second triple includes the first entity instance, a second relation, and a first class. The second relation relates the first entity instance to the first class. The method includes identifying a path based on the first triple and the second triple. The path is defined from the first scene to the first class with the first entity instance being between the first scene instance and the first class. The path includes at least the first relation and the second relation. The method includes reifying the path by generating a reified relation to represent the first relation and the second relation such that the reified relation directly relates the first scene to the first class. The method includes constructing a reified knowledge graph structure with a reified triple. The reified triple includes the first scene, the reified relation, and the first class. The method includes training a machine learning system to learn a latent space of the reified knowledge graph structure.
These and other features, aspects, and advantages of the present invention are discussed in the following detailed description in accordance with the accompanying drawings throughout which like characters represent similar or like parts.
The embodiments described herein, which have been shown and described by way of example, and many of their advantages will be understood by the foregoing description, and it will be apparent that various changes can be made in the form, construction, and arrangement of the components without departing from the disclosed subject matter or without sacrificing one or more of its advantages. Indeed, the described forms of these embodiments are merely explanatory. These embodiments are susceptible to various modifications and alternative forms, and the following claims are intended to encompass and include such changes and not be limited to the particular forms disclosed, but rather to cover all modifications, equivalents, and alternatives falling with the spirit and scope of this disclosure.
The system 100 includes a memory system 120, which is operatively connected to the processing system 110. In an example embodiment, the memory system 120 includes at least one non-transitory computer readable storage medium, which is configured to store and provide access to various data to enable at least the processing system 110 to perform the operations and functionality, as disclosed herein. In an example embodiment, the memory system 120 comprises a single memory device or a plurality of memory devices. The memory system 120 can include electrical, electronic, magnetic, optical, semiconductor, electromagnetic, or any suitable storage technology that is operable with the system 100. For instance, in an example embodiment, the memory system 120 can include random access memory (RAM), read only memory (ROM), flash memory, a disk drive, a memory card, an optical storage device, a magnetic storage device, a memory module, any suitable type of memory device, or any number and combination thereof. With respect to the processing system 110 and/or other components of the system 100, the memory system 120 is local, remote, or a combination thereof (e.g., partly local and partly remote). For example, the memory system 120 can include at least a cloud-based storage system (e.g. cloud-based database system), which is remote from the processing system 110 and/or other components of the system 100.
The memory system 120 includes at least a KEP system 130, a machine learning system 140, training data 150, and other relevant data 160, which are stored thereon. The KEP system 130 includes computer readable data with instructions, which, when executed by the processing system 110, is configured to provide KEP. The computer readable data can include instructions, code, routines, various related data, any software technology, or any number and combination thereof. In addition, the machine learning system 140 includes a knowledge graph embedding (KGE) model, a KGE algorithm, any suitable artificial neural network model, or any number and combination thereof. Also, the training data 150 includes a sufficient amount of sensor data, label data, KG data, KG structure data, various loss data, various weight data, and various parameter data, as well as any related machine learning data that enables the system 100 to provide the KEP, as described herein. Meanwhile, the other relevant data 160 provides various data (e.g. operating system, etc.), which enables the system 100 to perform the functions as discussed herein.
The system 100 is configured to include at least one sensor system 170. The sensor system 170 includes one or more sensors. For example, the sensor system 170 includes an image sensor, a camera, a radar sensor, a light detection and ranging (LIDAR) sensor, a thermal sensor, an ultrasonic sensor, an infrared sensor, a motion sensor, an audio sensor, an inertial measurement unit (IMU), any suitable sensor, or any number and combination thereof. The sensor system 170 is operable to communicate with one or more other components (e.g., processing system 110 and memory system 120) of the system 100. More specifically, for example, the processing system 110 is configured to obtain the sensor data directly or indirectly from one or more sensors of the sensor system 170. The sensor system 170 is local, remote, or a combination thereof (e.g., partly local and partly remote). Upon receiving the sensor data, the processing system 110 is configured to process this sensor data in connection with the KEP system 130, the machine learning system 140, the training data 150, or any number and combination thereof.
In addition, the system 100 may include at least one other component. For example, as shown in
In addition,
The TBox 208 describes a domain of interest by defining classes and properties as a domain vocabulary. Meanwhile, the Abox 206 includes assertions, which use the vocabulary defined by the TBox 208. For example, in
Also, in
The CV entity recognition system 304 employs visual object detection techniques to generate a set of entity labels 306 as output in response to receiving sensor data 302 as input. For example, the CV entity recognition system 304 may receive the sensor data 302 from the sensor system 170. The sensor data 302 may include raw images, video, LIDAR point clouds, other sensor data, or any combination thereof. The sensor data 302 may be two-dimensional (2D) sensor data (e.g., camera images) or three-dimensional (3D) sensor data (e.g., 3D point clouds). The CV entity recognition system 304 may generate 2D/3D bounding-boxes about the detections to enable those detections to be identified. The CV entity recognition system 304 may employ object recognition techniques, semantic segmentation techniques, or a combination thereof. Semantic segmentation takes a more granular approach by assigning a semantic category to each pixel in an image. The CV entity recognition system 304 identifies a set of detections (e.g. one or more detections) in the sensor data 302 and provides a set of entity labels 306 (e.g., one or more entity labels) for that set of detections. In this example, the CV entity recognition system 304 includes at least one machine learning system to perform this recognition task of generating the set of entity labels 306, for example, by classification techniques. The set of entity labels 306 are then used by the KG system 308.
As shown in
The KEP system 130 is configured to obtain or receive the KG 310 as input. The KEP system 130 is configured to output a set of additional entity labels 312 for a given scene instance. This set of additional entity labels 312 represents entities that are highly likely to be in the scene, but may have been missed or unrecognized during the CV entity recognition system 304. These entities may be missing or unrecognized by the CV entity recognition system 304 for various reasons, such as hardware limitations, occluded entities, poor field-of view, degraded visuals, etc. As shown in
At the first phase 402, in an example, the process includes performing KG construction. As shown in
The process also includes generating or obtaining an ontology 412. For example, in
As shown in
Referring back to the process 400 (
Afterwards, the KG 310 is constructed by converting the scene data contained in the dataset 410 (along with the additional information from external sources if available) to a format (e.g., RDF2 format), which is conformant with the ontology 412 (e.g. DSO). The relevant scene is queried and extracted from the dataset 410, making this process trivially straightforward. As an example, the RDF can then be generated using an RDF library (e.g., RDFLib4 Python library version: 4.2.2 or any suitable library). For instance, in
The DSKG contains data structures that include triples, where each triple is of the form of <h, r, t>, where h=head, r=relation, and t=tail and where h and t represent nodes and r represents an edge. For example, the DSKG includes triples of the form of <scenei, includes, carj> to indicate that an entity instance (carj) is included in a scene instance (scenes). In a number of the examples disclosed herein, the entity instances are expressed with all lowercase letters (e.g. carj) while their corresponding entity classes are expressed in title case (e.g. Car). An entity instance is linked to its class in DSO through triples of the form of <car, type, Car>. In this context, it may be tempting to formulate KEP as a linked path (LP) problem with the objective to complete triples of the form of <scenei, includes,?>, where ‘?’ represents the element to be predicted. This formulation, however, would entail predicting a specific entity instance rather than predicting the class of an entity. Similar to CV-based object recognition, the objective of KEP should be to predict the class of an entity in the scene—e.g. predicting Car rather than car. In other words, most LP models are unable to complete the triple of the form of <h, r, t> when there is no r that directly links h and t in the training data, even if h and t are linked through a path of n-hops (n>1) in the KG, such as <h, r1, t1>, <t1, r2, t>. This is the issue faced by the KEP system 130 with the DSKG, as a scene instance is connected to an entity sub-class only via a 2-hop path. Due to this requirement, the KEP system 130 cannot simply rely on LP in a straightforward manner. Therefore, upon generating the KG 310 (e.g., DSKG), the process 400 proceeds to the second phase 404 to overcome this technical problem.
At the second phase 404, in an example, the process 400 includes performing path reification. With path reification, for a given scene instance (e.g., “scene” node 414), the system 100 is configured to provide a technical solution for KEP by determining the entity class (e.g., “Entity Type” node 418) of an entity instance (e.g., “Entity” node 416). However, since the entity class (e.g., “Entity Type” node 418) is not immediately available through a direct link from the scene instance (e.g., “scene” node 414), the system 100 is configured to formulate this KEP task as a path prediction problem (i.e. predicting the path from a scene instance to the entity class). The path may be of any length (e.g., m-hop where m represents an integer greater than 1). To solve this path prediction problem, the system 100 introduces or creates a new relation (e.g., “includesType” relation 420) to the base KG 310 (e.g., DSKG). The system 100 uses this new relation to reify a multi-hop path (e.g., 2-hop path). More specifically, in this example, the system 100 generates a new relation (e.g., “includesType” relation 420), which directly links a source node (e.g., “scene” node 414) with a target node (e.g., “Entity Type” node 418) with a single-hop. In this regard, the “includesType” relation 420 is a combination of the “includes” relation 422 and the “type” relation 424. This requirement can be more formally defined as follows:
Let si be the ith scene instance node in DSKG (si ∈S) where S represents the set of all scene instance nodes in DSKG, ej be the jth entity instance node (ej ∈I), and ‘?’ be a subclass of Entity in the DSO such that (?∈E where E={Car, Animal, Pedestrian, . . . }⊆C). In this case, the system 100 is configured to perform path reification as follows:
si,includes,ej
∧
ej,type,?
⇒
si,includesType,?
[1]
With path reification, the DSKG is transformed into a DSKGR structure (i.e. DSKG with reified paths). Since the “includesType” relation is now present during training, the system 100 is configured to use or re-use link prediction (LP) methods to address the KG 310 incompleteness issue. More specifically, LP is a technique for predicting a missing link, such as predicting the head <?, r, t> or predicting the tail <h, r, ?> of a triple for a single hop (or a single relation). As a result of the creation of this reified relation (“includesType” relation 420) to provide a single hop, the KEP can now be mapped to LP in order to complete triples of the form
si, includesType, ?
in DSKGR. Before path reification, the KEP could not effectively be mapped to LP in order to predict the entity subclasses or the entity types of the DSKG due to the multiple hops (or multiple relations) that existed between the source node (“scene” instance) and the target node (“Entity Type” class).
Although DSKGR is described above as a KG structure with reified paths, the system 100 is not limited to this particular pattern for the reified KG structure. In this regard, the system 100 is configured to generate a reified KG structure with reified paths based on the KG 310 in a variety of patterns. The patterns may differ with respect to how entity instance information is represented along the path from a scene instance to an entity class.
As shown in
Referring back to
At the third phase 406, in an example, the process includes performing KGE learning. In this third phase, the system 100 is configured to translate the reified KG structure into at least one KGE 428, which encodes the reified KG structure in a low-dimensional, latent feature vector representation 426. More specifically, the system 100 uses at least one machine learning algorithm (e.g., KGE algorithm) to learn a representation of the reified KG structure with reified paths, which was constructed at the second phase 404. In this regard, the KGE learning is performed with an LP objective to generate a latent space, which may be useful for various downstream applications and various other tasks such as querying, entity typing, semantic clustering, etc.
In an example embodiment, the system 100 is configured to learn one or more KGEs 428 using one or more KGE algorithms and re-using the learned latent space for KEP. In this regard, the process may include selecting one or more KGE algorithms. A non-limiting example of a KGE algorithm includes TransE, HolE, ConvKB, or any number and combination thereof. As a first example, TransE is a representative KGE model, which learns relations between nodes as a geometric translation in the embedding space. This, however, limits its ability to handle symmetric/transitive relations, 1:N relations and N:1 relations. As a second example, HolE uses the circular correlation among head and tail of a triple with its relation embedding to learn an efficient compression of a full expressive bi-linear model. This allows both nodes and relations to be represented in d. As a third example, ConvKB learns a high-level feature map of the input triple by passing a concatenated node/relation embeddings through a convolution layer with set of filters (#filters τ=|Ω|). The fact score is then computed by using a dense layer with only one neuron and weights W. After the system 100 learns the latent space or embedding space from the reified KG structure using at least one KGE algorithm, the system 100 is configured to use the one or more KGEs 428 for KEP, as discussed below.
At the fourth phase 408, in an example, the process includes performing KEP. More specifically, the system 100 is configured to perform KEP with at least one learned KGE 428, as indicated by Algorithm 1. In addition, Table 2 provides a list of notations, which are used in Algorithm 1, as shown below.
∈
n×d
⊆ E
= S × r × E where S ⊆ I, E ⊆ C and r = includesType
= { }
in S do
S(
) = {(h, r, t)|h = s
, t ∈ E} ⊆
(h, r, t) in
S(
) do
= {(h, r, t′)|t′ ∈ E }
= lookup (
,
) ∈
m
d×
, m = |E|
= {q
, .., q
, .., q
|q
= ϕ(
(
)), q
∈
}
k(
) = {l
|l
∈
(
), x ≤ k ≤ m, m = |E|)}
= E
∪
K(
)
indicates data missing or illegible when filed
h, r, t
ϵ
Algorithm 1 is performed by the system 100, particularly the KEP system 130, via at least one processor of the processing system 110 or by any suitable processing device. As an overview, the system 100 is configured to receive the KGE 428 as input and provide a set a predicted entity classes (E*) for each scene instance (si) as output. More specifically, the system 100 is configured to perform a method for each scene instance (si) within the set of all scene instances. For each scene instance, the system 100 is configured to obtain a set of test triples (S) such that each test triple includes a particular scene instance (si) and an entity class that relates to that particular scene instance (si) within the set of all entity classes (E). For each triple in the set of test triples (
S), the system 100 generates a set of negative triples (xneg), which serve to determine how likely a candidate (t′) is linked to the given scene instance (si) via the given relation (r=“includesType” relation). In this case, each negative triple includes the scene instance (si), the “includesType” relation, and a candidate (t′). Each candidate (t′) represents an entity class within the set of all entity classes (E) that the given scene instance (si) may possibly include. After the set ofnegative triples is generated for the given scene instance, the system 100 is configured to retrieve embeddings via a lookup function from the learned KGE 428 based on each negative triple, as indicated in line 7 of Algorithm 1. The system 100 is configured to generate a score (e.g., a likelihood score 430) for each negative triple of the scene instance (si) based on the retrieved KG embeddings, as indicated in line 8 of Algorithm 1. The system 100 is configured to sort the negative triples based on the scores via the argsort function and obtain a set of the top-k labels, where ‘k’ represents a predetermined threshold (e.g., a preselected number of labels). The top-k labels represent the k-highest ranked labels (e.g., candidates or entity classes) for the given scene instance. The system 100 is configured to aggregate the set of top-k labels for the set of test triples (
S) to obtain a set of predicted entity classes (E*) for the given scene instance (si). The system 100 is configured to provide the set of predicted entity classes (E*) as output. The system 100 is thus configured to obtain a set of predicted entity classes, which are highly likely linked to the scene instance (si). The system 100 determines that the set of predicted entity classes (E′) include one or more missing or unrecognized entity classes of the scene instance (si).
As indicated in Algorithm 1, the objective of KEP is to predict a specific link captured by triples, where each triple is denoted as h, r, t
with ‘h’ representing a head (or a node), ‘r’ representing a relation (or edge), and ‘t’ representing a tail (or a node). To enable this more specific link prediction based on the reified KG structure, the system 100 is configured to learn the KGE representation of the nodes (i.e., heads and tails) and the edges (i.e., relations) using the LP objective. Then, for each scene instance si, the KGE 428 is queried using “includesType” relation to find the missing k-entity class labels
k⊆E (as indicated in lines 5-10 of Algorithm 1). Algorithm 1 succinctly describes this KEP process via at least one KGE 428, which is trained using at least one KGE algorithm. The computational complexity of Algorithm 1 is
(
×
) where
=|S| and
=|E|.
Furthermore, there are a number of differences between KEP, as presented above via Algorithm 1, and the traditional LP setup. For example, in contrast to Algorithm 1, the KGE algorithms for LP learn to maximize the estimated plausibility ϕ(h, r, t) for any valid triple while minimizing it for any invalid, or negative, triple. Such KGE models can then be used to infer any missing link by obtaining the element (head or tail) with the highest plausibility to complete the triple h, r, t
. However, as expressed in Algorithm 1, the processing system 110 is configured to perform KEP in a different manner than the traditional LP setup.
As discussed herein, the embodiments include a number of advantageous features, as well as benefits. For example, the embodiments are configured to perform KEP, which improves scene understanding by predicting potentially unrecognized entities and by leveraging heterogeneous, high-level semantic knowledge of driving scenes. The embodiments provide an innovative neuro-symbolic solution for KEP based on knowledge-infused learning, which (i) introduces a dataset agnostic ontology to describe driving scenes, (ii) uses an expressive, holistic representation of scenes with KGs, and (iii) proposes an effective, non-standard mapping of the KEP problem to the problem of LP using KGE. The embodiments further demonstrate that knowledge-infused learning is a potent tool, which may be effectively utilized to enhance scene understanding for at least partially autonomous driving systems or other application systems.
Overall, the embodiments introduce the KEP task and propose an innovative knowledge-infused learning approach. The embodiments also provide a dataset agnostic ontology to describe driving scenes. The embodiments map the KEP to the problem of KG link prediction by a technical solution that overcomes various limitations through a process that includes at least path reification.
That is, the above description is intended to be illustrative, and not restrictive, and provided in the context of a particular application and its requirements. Those skilled in the art can appreciate from the foregoing description that the present invention may be implemented in a variety of forms, and that the various embodiments may be implemented alone or in combination. Therefore, while the embodiments of the present invention have been described in connection with particular examples thereof, the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the described embodiments, and the true scope of the embodiments and/or methods of the present invention are not limited to the embodiments shown and described, since various modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims. For example, components and functionality may be separated or combined differently than in the manner of the various described embodiments, and may be described using different terminology. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure as defined in the claims that follow.