This application claims priority to Chinese Application No. 2023108027247 filed on Jun. 30, 2023, which is incorporated herein by reference in its entirety.
The present disclosure relates to a field of artificial intelligence, particularly to technical fields such as deep learning, natural language processing (NLP), computer vision, and speech processing.
A knowledge graph is a structured semantic knowledge base that may use a visual graph to describe concepts and their interrelationships in the physical world. The knowledge graph has a wide range of applications in search, question answering, and assisting big data analysis.
A relational triplet is a key component of the knowledge graph, so extracting the relational triplet from texts with different sources and structures is a crucial step in an automated construction of the knowledge graph.
The present disclosure provides a method of processing a text, a method of training a deep learning model, a method of generating a knowledge graph, an electronic device, and a non-transitory computer-readable storage medium.
According to an aspect of the present disclosure, a method of processing a text is provided, including: encoding a text to be processed to obtain a feature information; identifying a plurality of entity information from the text, based on the feature information; generating a word relation tensor based on the feature information; and determining a relation between the plurality of entity information by using the word relation tensor, so as to generate a plurality of relational triplets related to the text.
According to another aspect of the present disclosure, a method of training a deep learning model is provided, including: inputting a word feature matrix corresponding to a sample feature information into the deep learning model, so as to obtain a sample word relation tensor; and adjusting a parameter of the deep learning model according to a difference between the sample word relation tensor and a real word relation tensor, to obtain a trained deep learning model.
According to another aspect of the present disclosure, a method of generating a knowledge graph is provided, including: generating the knowledge graph by using a plurality of relational triplets; wherein the plurality of relational triplets are generated by the method of processing a text provided by the present disclosure.
According to another aspect of the present disclosure, a method of generating a knowledge graph is provided, including: encoding a text to be processed to obtain a feature information; identifying a plurality of entity information from the text, based on the feature information; inputting a word feature matrix corresponding to the feature information into a deep learning model, so as to obtain a word relation tensor; generating a plurality of relational triplets according to the word relation tensor and the plurality of entity information; and generating the knowledge graph by using the plurality of relational triplets; wherein the deep learning model is trained by the method of training a deep learning model provided by the embodiments of the present disclosure.
According to another aspect of the present disclosure, an electronic device is provided, including: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions, when executed by the at least one processor, cause the at least one processor to implement at least one of the following methods provided by the present disclosure: the method of processing a text, the method of training a deep learning model, and the method of generating a knowledge graph.
According to another aspect of the present disclosure, a non-transitory computer-readable storage medium having computer instructions stored thereon is provided, wherein the computer instructions are used to cause a computer to implement at least one of the following methods provided by the present disclosure: the method of processing a text, the method of training a deep learning model, and the method of generating a knowledge graph.
It should be understood that content described in this section is not intended to identify key or important features in the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be easily understood through the following description.
The accompanying drawings are used to understand the present disclosure better and do not constitute a limitation to the present disclosure, in which:
Exemplary embodiments of the present disclosure will be described below with reference to the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding and should be considered as merely exemplary. Therefore, those of ordinary skilled in the art should realize that various changes and modifications may be made to the embodiments described herein without departing from the scope and spirit of the present disclosure. Likewise, for clarity and conciseness, descriptions of well-known functions and structures are omitted in the following description.
Constructing a knowledge graph includes data acquisition, knowledge extraction, knowledge fusion, and knowledge processing. The knowledge extraction is a process of extracting entities and relations between the entities from data with different sources and structures. For example, the knowledge extraction may involve extracting a relational triplet from data. The relational triplet includes two entities and a semantic relation between these two entities. For example, the relational triplet may include a subject entity (also referred as subject), a relation, and an object entity (also referred as object). The subject entity may be called a head entity, and the object entity may be called a tail entity. One head entity and one tail entity form an entity pair.
In the process of extracting the relational triplet, there may be a relation overlap and an entity overlap. For example, in the data, there may be only one entity pair and relation, there may be a plurality of relations between the entity pair, or there may be a plurality of relations between one entity and a plurality of different entities.
Due to the complexity of natural language, there are some difficulties in the process of extracting the relational triplet. For example, the number of entities and the relation between entities in the data are not fixed, which makes the extraction of relational triplets uncertain. For example, due to the complexity of semantics in real life, it is desired to consider the correlation between entities with different relation types. For example, a complex relation structure may encounter overlapping issues, such as two different relational triplets sharing one or two identical entities.
The present disclosure provides a method of processing a text, which extracts a plurality of relational triplets from the text by using the correlation between entity relations.
In an embodiment of the present disclosure, a three-dimensional word relation tensor is used to describe a relation between words in a text. The word relation tensor may reflect the correlation between relations. The correlation of relations may be described more accurately by learning and optimizing the word relation tensor, thereby improving the accuracy of extracting the relational triplet.
The following describes the method of processing a text provided in the present disclosure in conjunction with
As shown in
In operation S110, a text to be processed is encoded to obtain a feature information.
For example, the text to be processed may be a sentence expressed in a textual form. The sentence may include a plurality of characters (or words) and a plurality of punctuation marks. The feature information may represent features for the plurality of characters (or words) and the plurality of punctuation marks in the sentence.
For example, the feature information may be a feature matrix. A multidimensional matrix is used to represent the features for the plurality of characters (or words) and the plurality of punctuation marks.
In operation S120, a plurality of entity information are identified from the text, based on the feature information.
In an embodiment of the present disclosure, the entity information is a phrase used to describe an objectively existing object in the text. For example, in a text in Chinese language “(There are many apples on the tree)”, apple and tree are objectively existing entities, so the Chinese characters “(apples)” in the text are the entity information of the entity apple and the Chinese character “(tree)” in the text is the entity information of the entity tree.
In an embodiment of the present disclosure, the entity information describing an entity may be a plurality of words or a single word. For example, in a process of identifying the text “”, “” may be used as the entity information to describe the entity, “” and “” cannot be used separately as the entity information to describe the entity, and “” may be used as the entity information to describe the entity. In order to accurately identify the entity information of the text, feature analysis may be carried out using the feature information by taking a single character in the text as an object and taking continuous characters in the text as an object, so as to determine the entity information in form of single character and entity information in form of word in the text.
In operation S130, a word relation tensor is generated based on the feature information.
In an embodiment of the present disclosure, the word relation tensor includes a plurality of relations that may exist between a plurality of entities. For example, the word relation tensor may include a plurality of types of relations pre-defined for the plurality of entities. For example, the relation type may be a membership relation and an inclusion relation.
For example, the word relation tensor is a three-dimensional tensor. For example, the word relation tensor is generated according to two sets of feature information and a learnable relational basis matrix. The two sets of feature information are the feature information of the two entities in the entity pair. The learnable relational basis matrix may represent a plurality of possible relations between the two entities.
For example, the plurality of relations included in the word relation tensor may be obtained through deep learning. For example, in the process of tensor learning, connections between all relation types may be established by sharing some model parameters with all relation types, so as to obtain the word relation tensor. The word relation tensor may reflect the correlation between the plurality of relations.
For example, the relation included in the word relation tensor may also be determined from the text. For example, the feature information includes the entity information and the relation information included in the text. The word relation tensor generated based on the feature information also includes a relation between entities described in the text.
In operation S140, a relation between the plurality of entity information is determined by using the word relation tensor, so as to generate a plurality of relational triplets related to the text.
In an embodiment of the present disclosure, a real relation between each two entities among the plurality of entities is predicted by using the plurality of relations that may exist between the plurality of entities included in the word relation tensor. The predicted relation with the highest reliability is recorded as the real relation between the two entities in the relational triplet corresponding to the two entities.
In the embodiment of the present disclosure, the word relation tensor is generated based on the feature information of the text. The relation between entities is predicted by fully utilizing the correlation between the plurality of relations in the word relation tensor, so as to generate the relational triplet, thereby achieving extracting the relation from the text. In addition, the process of identifying the entity and generating the word relation tensor may be trained through the deep learning model. The accuracy of entity and word relation tensor may be continuously improved by continuously optimizing losses of the processes of identifying the entity and generating the word relation tensor, thereby improving the efficiency of extracting the relational triplet.
The following describes the principle of the method of processing a text provided in the present disclosure in conjunction with
As shown in
In some embodiments, encoding a text to be processed to obtain a feature information includes: extracting a plurality of word objects from the text; determining a plurality of word features for the plurality of word objects respectively; and encoding the plurality of word features to obtain the feature information.
In an embodiment of the present disclosure, the word object may be a word or a punctuation mark. For example, assuming that the sentence S is “Mary lives in New York, American.”, the word objects extracted from the sentence S include “Mary”, “lives”, “in”, “New”, “York”, “,”, “American”, and “.”.
For example, the word object may be vectorized, and feature analysis may be performed on each character of each word object to obtain the word feature for each word object.
For example, the sentence S may include word objects wi, where i=1, . . . , nw, and nw is a positive integer. wi represents the i-th word object in the sentence S. For the word object wi, a word vector eiw may be initialized by using a pre-trained GloVe 840B word vector, and the word object may be vectorized by using the initialized word vector eiw. A word character level morphological feature eic of each word object is acquired by using a Long Short Term Memory (LSTM) network.
For example, the word feature ei of each word object may be determined by using the following equation (1):
where eiw is the word vector for the word object wi, eic is the character level morphological feature for the word object wi, and ∥ is vector concatenation.
In some embodiments, encoding the plurality of word features of the plurality of word objects to obtain the feature information includes: encoding the plurality of word features to obtain a plurality of context information of the plurality of word objects; determining a plurality of hidden features for the plurality of word objects respectively, according to the plurality of context information; and generating the feature information according to the plurality of hidden features.
In an embodiment of the present disclosure, the word feature sequence of the sentence S is encoded by using a bi-directional long-short term memory (BiLSTM) network, so as to generate the context information of each word object. For example, the word feature sequence of the sentence S includes word features ei corresponding to all word objects in the sentence S, that is, the word feature sequence is [e1, . . . , en
By using the BiLSTM network, the sentence S may be encoded from front to back or from back to front, the dependency relation between the plurality of word objects wi in the sentence S may be analyzed, and the semantic and grammatical structure of the sentence S may be determined, so as to acquire a plurality of context information of the word object wi. The relation between each word object wi and consecutive word objects in front of and behind the word object wi may be determined according to the context information.
For example, information from the previous word object to the next word object in the word feature sequence [e1, . . . , en
The BiLSTM network includes a plurality of hidden layers for analyzing the semantic and structural features for the plurality of word objects. The hidden feature for each word object in the corresponding hidden layer may be determined based on the context information of each word object.
The feature information may be represented in a form of matrix. For example, a hidden feature sequence may be generated according to the hidden features for the plurality of word objects. The hidden feature sequence is used as the feature information.
For example, the encoding process may be represented by using the following equation (2):
For example, the hidden feature for the word object may be determined by using the following equation (3):
For example, the feature matrix H may be determined by using the following equation (4):
In some embodiments, identifying a plurality of entity information from the text, based on the feature information includes: determining, from a plurality of preset label sequences, a target label sequence corresponding to the feature information; annotating an entity type for each of a plurality of word objects in the text, according to the target label sequence; determining an entity scope according to the entity types of the plurality of word objects, wherein the entity scope indicates a number of word objects included in an entity and position information of the word objects included in the entity in the text; and determining the plurality of entity information of the text according to the entity scope.
In an embodiment of the present disclosure, the plurality of preset label sequences may be predefined. The label sequence indicates the entity type for each word object in the sentence S. For example, the entity type includes, for example, an entity beginning (begin, B), an entity ending (inside, I), and invalid (outside, O). For example, the entity type for each word object in the sentence S may be exhaustively listed by arranging and combining, so as to obtain a label sequence set Ys. The label sequence set Ys may include all possible label sequences in the sentence S.
In an embodiment of the present disclosure, the target label sequence may be determined from the label sequence set Ys by using the deep learning model. For example, determining an evaluation value matrix related to labels according to the feature information; determining a plurality of evaluation value functions for the plurality of preset label sequences respectively, according to evaluation value matrices of a plurality of preset labels included in the plurality of preset label sequences; determining a probability of each preset label sequence among the plurality of preset label sequences being selected as the target label sequence, according to the plurality of evaluation value functions; and determining the target label sequence from the plurality of preset label sequences according to the probability.
In an embodiment of the present disclosure, an evaluation value function for the label sequence is constructed based on the feature information of the sentence S, so as to determine the correlation between the entity type in the label sequence and the feature for the word object in the sentence S.
In an embodiment of the present disclosure, an evaluation value matrix V is determined from the feature matrix H by using multi-layer perceptron (MLP).
MLP may merge the features for the plurality of hidden layers included in the feature matrix H. The last layer of MLP is treated as a linear model. The feature matrix H may be transformed into the evaluation value matrix V through MLP, so as to acquire more accurate features in the sentence S.
According to a conditional random field, in a case that only the feature relation between adjacent word objects in the sentence S is considered, the evaluation value function β(Ŷ) for the label sequence may be determined by using the following equation (5):
In an embodiment of the present disclosure, determining a probability of each preset label sequence among the plurality of preset label sequences being selected as the target label sequence, according to the plurality of evaluation value functions includes: determining an expected value of the plurality of preset label sequences related to the text, according to the plurality of evaluation value functions; and determining the probability according to a ratio between each of the plurality of evaluation value functions and the expected value.
For example, the probability of each label sequence being selected as the target label sequence may be determined according to the evaluation value of each label sequence and an average evaluation value of all label sequences.
For example, the probability P(Ŷ|S) of the label sequence being selected may be determined by using the following equation (6):
where {tilde over (Y)} is a label sequence in the label sequence set Ys, and Z(S) is the expected value.
In an embodiment of the present disclosure, according to the probability of the label sequence being selected, the label sequence with the highest probability may be determined to be the most reliable and may be selected as the target label sequence.
In an embodiment of the present disclosure, in a case that the deep learning model may be used to determine the target label sequence from the label sequence set Ys, a negative logarithmic likelihood loss function may be used to train the deep learning model. During the training process, the deep learning model is trained with the goal of minimizing the loss.
For example, the deep learning model is trained by using the following equation (7) as the loss function:
In an embodiment of the present disclosure, in a case that the target label sequence is determined, instance identification may be performed on the sentence S according to the entity type included in the target label sequence.
For example, the word object w1 of the sentence S is annotated by using the first entity type included in the target label sequence. If it is determined that the entity type for the word object w1 is B, the word object w1 may be added to an entity et1. The entity et1 represents the first entity identified from the sentence S. The word object w2 of the sentence S is annotated by using the second entity type included in the target label sequence. If it is determined that the entity type for the word object w2 is O or B, the entity et1 is added to an entity set E. If it is determined that the entity type for the word object w2 is I, the word object w2 may be added to the entity et1.
For example, the entity scope may be determined when annotating the entity types for consecutive word objects.
For example, in a case where the word object “Mary”, the word object “lives”, and the word object “in” in the sentence S shown in
For example, in a case where the word object “New”, the word object “York”, and the word object “,” are annotated with B, I, and O respectively, such an entity scope may be determined that the second entity includes the fourth word object “New” to the fifth word object “York”.
In some embodiments, generating a word relation tensor based on the feature information includes: determining a word feature matrix according to the feature information; and generating the word relation tensor according to the word feature matrix.
In an embodiment of the present disclosure, the feature matrix H may be transformed into two word feature matrices Ms=MLPs(H) and Mo=MLPo(H). Ms=MLPs(H) represents a word feature matrix when all entities in the sentence S are head entities. Mo=MLPo(H) represents a word feature matrix when all entities in the sentence S are tail entities. The feature matrix H is transformed into the word feature matrix Ms for head entity and the word feature matrix Mo for tail entity through MLP, so that the word feature matrix Ms for head entity and the word feature matrix Mo for tail entity may be used to more accurately describe the features for the head and tail entities in the entity pair.
In an embodiment of the present disclosure, a three-dimensional word relation tensor may be generated by using the word feature matrix Ms for head entity and the word feature matrix Mo for tail entity. The relation between the head entity and the tail entity may be more accurately represented by using the word relation tensor.
For example, determining the word relation tensor according to the word feature matrices Ms and Mo may include: constructing a relation core tensor, wherein the relation core tensor includes a plurality of relational basis matrices; generating a relation feature matrix according to a modular product between a preset relational weight matrix and the relation core tensor; and generating the word relation tensor according to the relation feature matrix and the word feature matrix.
In an embodiment of the present disclosure, the relation core tensor G may include a plurality of predefined relations. The relation core tensor G is obtained by superimposing m learnable relational basis matrices. For example, the relational core tensor G may include m relation feature matrices, G∈. The relational weight matrix Mr represents the weights of the plurality of relations included in the relation core tensor G, the weights being related to the entity pairs, where Mr∈, dw, m and nr are positive integers, dw is a dimension of the word feature matrix, and nr is the number of predefined relations included in the relation core tensor G.
Both the parameters of the relational weight matrix Mr and the parameters of the relation core tensor G are learnable. The parameters of the relational weight matrix Mr and the relation core tensor G may be optimized by deep learning.
For example, the word relation tensor may be output from the deep learning model after inputting the word feature matrix into the deep learning model.
For example, an activation function is constructed by using the following equation (8) to determine the relation feature matrix and the word relation tensor {tilde over (X)}:
In an embodiment of the present disclosure, the feature matrix for each relation included in the relation feature matrix is the weighted sum of the relational basis matrices in the relation core tensor G.
In an embodiment of the present disclosure, the word relation tensor {circumflex over (X)} is represented in the form of Tucker decomposition, i.e. the word relation tensor {circumflex over (X)} is represented as the modular product between the core tensor and a plurality of matrices. This enables in the deep learning process of tensors, connections between all relations may be established by sharing some model parameters (core tensors) with all relation types, so as to learn the correlation between the relations in a tensor learning task.
In some embodiments, determining a relation between the plurality of entity information according to the plurality of word relation tensors, so as to obtain a plurality of relational triplets includes: generating a plurality of entity pairs according to the plurality of entity information, wherein the entity pair includes two entities indicated by any two entity information among the plurality of entity information; generating the plurality of relational triplets according to the plurality of entity pairs and corresponding relations, in a case that a correlation between the plurality of entity pairs and a plurality of relations included in the word relation tensor meets a preset condition.
In an embodiment of the present disclosure, the entity pair is generated by any two entities among the plurality of entities without duplicate selection. For example, 3 entity pairs may be generated for 3 entities.
In an embodiment of the present disclosure, the word relation tensor is traversed according to a plurality of positions where the word objects included in the head and tail entities of the entity pair are located in the sentence S, so as to obtain a plurality of relations corresponding to the plurality of positions in the word relation tensor. The target condition that satisfies the correlation condition with the entity pair is determined from the plurality of relations.
For example, the word object included in the head entity is located at the first position in the sentence S, while the word objects included in the tail entity are located at fourth and fifth positions in the sentence S. The three-dimensional word relation tensor is {circumflex over (X)}∈. The tensor elements located at (1, 1, 4), . . . , (1, k, 4), . . . , (1, nr, 4), as well as (1, 1, 5), . . . , (1, k, 5), . . . , (1, nr, 5) in the three-dimensional word relation tensor are acquired, where k=1, . . . , nr. 2nr tensor elements represent nr relations. For example, the tensor elements located at (1, k, 4) and (1, k, 5) are used to represent the k-th relation corresponding to the entity pair.
A relation having a correlation to the entity pair that satisfies the preset condition is determined from nr relations corresponding to the entity pair.
For example, generating the plurality of relational triplets according to the plurality of entity pairs and corresponding relations, in a case that a correlation between the plurality of entity pairs and a plurality of relations included in the word relation tensor meets a preset condition, includes: for each entity pair, acquiring two entity length information of two entities in the entity pair; determining a plurality of correlation values between the entity pair and the plurality of relations, according to the two entity length information and the word relation tensor; and generating at least one relational triplet according to at least one relation corresponding to the at least one correlation value and the entity pair, in a case that at least one correlation value among the plurality of correlation values is greater than or equal to a preset threshold.
For example, in a case that tensor elements for the plurality of relations corresponding to the entity pair are obtained, the correlation between the entity pair and the plurality of relations may be calculated according to the length of the entity pair and the sum of tensor elements of each relation.
For example, a head entity of an entity pair (eti, etj)(eti={wi
For example, the correlation between the entity pair and nr relations may be determined by using the following equation (9):
For example, the entity length information may be determined by using the following equation (10):
For example, as shown in
For example, for the head entity “Mary” and the tail entity “New York”, a sum A of the tensor element {tilde over (X)}1k4 located at (1, k, 4) and the tensor element {circumflex over (X)}1k5 located at (1, k, 5), as well as a product B of the length of the head entity “Mary” and the length of the tail entity “New York”, are determined. The correlation between the k-th relation rk and the entity pair “Mary-New York” is determined by comparing the ratio of A to B with a threshold δ. For example, when it is determined that the ratio of A to B is greater than or equal to the threshold δ, the k-th relation rk may be determined as the relation information of the entity pair “Mary-New York”, and a relational triplet (Mary, rk, New York) is generated. For example, when it is determined that the fifth relation r5 also satisfies the above relation, a relational triplet (Mary, r5, New York) may be generated.
The relational triplet (Mary, rk, New York) and the relational triplet (Mary, r5, New York) may be recorded in the relational triplet set Ts, so that the relational triplets may be applied to tasks such as knowledge graphs, question answering systems, and dialogue systems.
For example, as shown in
Through the embodiment of the present disclosure, the word relation matrix is constructed by using the plurality of predetermined relations and the word feature matrix corresponding to the text. The word relation matrix may represent the correlation between relations. The relation between the entity pair is determined by using the word relation matrix, which may fully utilize the correlation between relations, thereby improving the accuracy of relation extraction.
The present disclosure further provides a method of training a deep learning model, which will be described in detail below in conjunction with
As shown in
In operation S310, a word feature matrix corresponding to a sample feature information is input into the deep learning model, so as to obtain a sample word relation tensor.
In operation S320, a parameter of the deep learning model is adjusted according to a difference between the sample word relation tensor and a real word relation tensor, to obtain a trained deep learning model.
According to an embodiment of the present disclosure, the sample text is encoded to obtain the sample feature information. The sample feature information is input into the deep learning model in the form of a matrix.
The deep learning model is trained, by using the sample word relation tensor as a predicted tensor and the real word relation tensor as a real tensor (also referred as gold tensor), to optimize the parameter of the deep learning model, thereby reducing the difference between the sample word relation tensor and the real word relation tensor.
In some embodiments, in operation S310, inputting a word feature matrix corresponding to a sample feature information into the deep learning model, so as to obtain a sample word relation tensor may include: constructing a sample relation core tensor according to a plurality of preset sample relations, wherein the sample relation core tensor includes a plurality of sample relational basis matrices corresponding to a plurality of sample relations; generating a sample relation feature matrix according to a modular product between a preset sample relational weight matrix and the sample relation core tensor; and determining the sample word relation tensor according to the sample relation feature matrix and a sample word feature matrix.
In an embodiment of the present disclosure, the preset sample relation may be a pre-defined sample relation. The relation core tensor is obtained by superimposing the plurality of learnable relational basis matrices. For example, the relation core tensor may include the plurality of relation feature matrices. The relational weight matrix represents the weights of the plurality of relations included in the relation core tensor, the weight being related to the entity pair. The parameters of the relational weight matrix and the parameters of the relation core tensor are both learnable. The parameters of the relational weight matrix and the relation core tensor may be continuously adjusted by training the deep learning model.
For example, the sample word relation tensor may be determined by the equation (8) above.
In some embodiments, in operation S320, adjusting a parameter of the deep learning model according to a difference between the sample word relation tensor and a real word relation tensor, to obtain a trained deep learning model may include: determining a focus loss function according to the sample word relation tensor; determining an exponential function according to the real word relation tensor; determining a model loss function according to the focus loss function and the exponential function; and adjusting the parameter of the deep learning model according to the model loss function, to obtain the trained deep learning model.
In an embodiment of the present disclosure, in a case of imbalanced distribution of positive and negative samples included in the sample word relation tensor, the deep learning model is difficult to fit positive samples, which reduces the accuracy of predicting positive samples. The model training method provided in the present disclosure improves the learning ability of the deep learning model on positive samples by introducing an exponential function and a focus loss function, so as to solve the above problem.
For example, the exponential function is used to assign different weights to positive and negative samples in the real word relation tensor, so as to achieve a balanced distribution of positive and negative samples.
For example, the focus loss function is used to focus on positive samples in the sample word relation tensor, so that the training direction for the deep learning model is more focused on improving the predictive ability on positive samples.
In some embodiments, determining an exponential function according to the real word relation tensor includes: acquiring a real positive sample and a real negative sample from the real word relation tensor, wherein each of the real positive sample and the real negative sample indicates a real relation; assigning weights to the real positive sample and the real negative sample respectively, according to a number of real positive samples and a number of real negative samples; and determining the exponential function according to the weights.
In an embodiment of the present disclosure, due to the uneven distribution between real positive samples and real negative samples, different weights may be assigned to real positive samples and real negative samples respectively according to the number of real positive samples and real negative samples.
During the training process, due to the small number of real positive samples, a larger weight may be assigned to the real positive samples and a smaller weight may be assigned to the real negative samples, so as to balance the number of real positive samples and real negative samples.
For example, the exponential function ϕ(x) is determined by using the following equation (11):
In an embodiment of the present disclosure, the positive sample may be considered as samples with rare labels, while negative samples may be considered as samples with common labels. The difficulty of predicting positive samples is usually greater than that of predicting negative samples. By introducing the focus loss function, the training of the model is focused on optimizing the training of positive samples.
For example, the focus loss function 1f(x) is determined by using the following equation (12):
In an embodiment of the present disclosure, the parameter of the deep learning model may be optimized by using the focus loss function lf(x) and the exponential function ϕ(x) to minimize the loss function of the model.
For example, the model loss function is determined by using the following equation (13):
wherein {circumflex over (X)}ukv is the tensor element of the sample word relation tensor {circumflex over (X)}.
In some embodiments, the total loss function is determined by using the following equation (14):
In an embodiment of the present disclosure, the losses of the deep learning models for entity identification and tensor learning are minimized by training the two deep learning models, so as to improve the ability of entity identification and tensor learning, thereby improving the efficiency and accuracy of relation extraction.
Through the method of training the deep learning model provided in the present disclosure, the process of extracting relations of the entity pair is transformed into a tensor learning model. By training the deep learning model, the deep learning model may output the word relation tensor that is close to the real word relation tensor, thereby improving the accuracy of relation extraction.
Based on the method of processing a text provided in the present disclosure, the present disclosure further provides a method of generating a knowledge graph, which will be described in detail in conjunction with
As shown in
In operation S410, the knowledge graph is generated by using a plurality of relational triplets.
According to an embodiment of the present disclosure, the plurality of relational triplets are generated by using the method of processing the text described above.
In an embodiment of the present disclosure, through the method of generating the knowledge graph obtained by the method of processing the text provided in the present disclosure, more accurate relational triplets may be acquired by fully utilizing the correlation between relations of the entity pair, so as to generate a knowledge graph that may include more complete information.
Based on the method of training a deep learning model provided in the present disclosure, the present disclosure further provides a method of generating a knowledge graph, which will be described in detail in conjunction with
As shown in
In operation S510, a text to be processed is encoded to obtain a feature information.
Then, in operation S520, a plurality of entity information are identified from the text, based on the feature information.
Then, in operation S530, a word feature matrix corresponding to the feature information is input into a deep learning model, so as to obtain a word relation tensor.
Then, in operation S540, a plurality of relational triplets are generated according to the word relation tensor and the plurality of entity information.
Then, in operation S550, the knowledge graph is generated by using the plurality of relational triplets.
According to an embodiment of the present disclosure, the deep learning model is trained through the method of training the deep learning model described above.
In an embodiment of the present disclosure, the method of training the deep learning model is similar to the method shown in
As shown in
The encoding module 610 is used to encode a text to be processed to obtain a feature information. In one embodiment, the encoding module 610 may be used to implement the operation S110 described above, which will not be repeated here.
The identifying module 620 is used to identify a plurality of entity information from the text, based on the feature information. In one embodiment, the identifying module 620 may be used to implement the operation S120 described above, which will not be repeated here.
The first generation module 630 is used to generate a word relation tensor based on the feature information. In one embodiment, the first generation module 630 may be used to implement the operation S130 described above, which will not be repeated here.
The determination module 640 is used to determine a relation between the plurality of entity information by using the word relation tensor, so as to generate a plurality of relational triplets. In one embodiment, the determination module 640 may be used to implement the operation S140 described above, which will not be repeated here.
According to an embodiment of the present disclosure, the encoding module 610 is further used to extract a plurality of word objects from the text; determine a plurality of word features for the plurality of word objects respectively; and encode the plurality of word features to obtain the feature information.
According to an embodiment of the present disclosure, the encoding module 610 is further used to encode the plurality of word features to obtain a plurality of context information of the plurality of word objects; determine a plurality of hidden features for the plurality of word objects respectively, according to the plurality of context information; and generate the feature information according to the plurality of hidden features.
According to an embodiment of the present disclosure, the identifying module 620 is further used to determine, from a plurality of preset label sequences, a target label sequence corresponding to the feature information; annotate an entity type for each of a plurality of word objects in the text, according to the target label sequence; determine an entity scope according to the entity types of the plurality of word objects, wherein the entity scope indicates a number of word objects included in an entity and position information of the word objects included in the entity in the text; and determine the plurality of entity information of the text according to the entity scope.
According to an embodiment of the present disclosure, the identifying module 620 is further used to determine an evaluation value matrix related to labels according to the feature information; determine a plurality of evaluation value functions for the plurality of preset label sequences respectively, according to evaluation value matrices of a plurality of preset labels included in the plurality of preset label sequences; determine a probability of each preset label sequence among the plurality of preset label sequences being selected as the target label sequence, according to the plurality of evaluation value functions; and determine the target label sequence from the plurality of preset label sequences according to the probability.
According to an embodiment of the present disclosure, the identifying module 620 is further used to determine an expected value of the plurality of preset label sequences related to the text, according to the plurality of evaluation value functions; and determine the probability according to a ratio between each of the plurality of evaluation value functions and the expected value.
According to an embodiment of the present disclosure, the first generation module 630 is further used to determine a word feature matrix according to the feature information; and generate the word relation tensor according to the word feature matrix.
According to an embodiment of the present disclosure, the first generation module 630 is further used to construct a relation core tensor, wherein the relation core tensor includes a plurality of relational basis matrices; generate a relation feature matrix according to a modular product between a preset relational weight matrix and the relation core tensor; and generate the word relation tensor according to the relation feature matrix and the word feature matrix.
According to an embodiment of the present disclosure, the determination module 640 is further used to generate a plurality of entity pairs according to the plurality of entity information, wherein the entity pair includes two entities indicated by any two entity information among the plurality of entity information; generate, in a case that a correlation between the plurality of entity pairs and a plurality of relations included in the word relation tensor meets a preset condition, the plurality of relational triplets according to the plurality of entity pairs and corresponding relations.
According to an embodiment of the present disclosure, the determination module 640 is further used to for each entity pair, acquire two entity length information of two entities in the entity pair; determine a plurality of correlation values between the two entity length information and the plurality of relations, according to the two entity length information and the word relation tensor; and generate, in a case that at least one correlation value among the plurality of correlation values is greater than or equal to a preset threshold, at least one relational triplet according to at least one relation corresponding to the at least one correlation value and the entity pair.
Based on the method of training a deep learning model provided in the present disclosure, the present disclosure further provides an apparatus of training a deep learning model, which will be described in detail in conjunction with
As shown in
The first inputting module 710 is used to input a word feature matrix corresponding to a sample feature information into the deep learning model, so as to obtain a sample word relation tensor. In one embodiment, the first inputting module 710 may be used to implement the operation S310 described above, which will not be repeated here.
The adjusting module 720 is used to adjust a parameter of the deep learning model according to a difference between the sample word relation tensor and a real word relation tensor, to obtain a trained deep learning model. In one embodiment, the adjusting module 720 may be used to implement the operation S320 described above, which will not be repeated here.
According to an embodiment of the present disclosure, the first inputting module 710 is further used to construct a sample relation core tensor according to a plurality of sample relations, wherein the sample relation core tensor includes a plurality of sample relational basis matrices corresponding to a plurality of sample relations; generate a sample relation feature matrix according to a modular product between a preset sample relational weight matrix and the sample relation core tensor; and determine the sample word relation tensor according to the sample relation feature matrix and a sample word feature matrix.
According to an embodiment of the present disclosure, the adjusting module 720 is further used to determine a focus loss function according to the sample word relation tensor; determine an exponential function according to the real word relation tensor; determine a model loss function according to the focus loss function and the exponential function; and adjust the parameter of the deep learning model according to the model loss function, to obtain the trained deep learning model.
According to an embodiment of the present disclosure, the adjusting module 720 is further used to acquire a real positive sample and a real negative sample from the real word relation tensor, wherein each of the real positive sample and the real negative sample indicates a real relation; assign weights to the real positive sample and the real negative sample respectively, according to a number of real positive samples and a number of real negative samples; and determine the exponential function according to the weights.
Based on the method of generating a knowledge graph provided in the present disclosure, the present disclosure further provides an apparatus of generating a knowledge graph, which will be described in detail in conjunction with
As shown in
The second generation module 810 is used to generate the knowledge graph by using a plurality of relational triplets; wherein the plurality of relational triplets are generated by the apparatus of processing a text provided by the embodiments of the present disclosure. In one embodiment, the second generation module 810 may be used to implement the operation S410 described above, which will not be repeated here.
Based on the method of generating a knowledge graph provided in the present disclosure, the present disclosure further provides an apparatus of generating a knowledge graph, which will be described in detail in conjunction with
As shown in
The encoding module 910 is used to encode a text to be processed to obtain a feature information. In one embodiment, the encoding module 910 may be used to implement the operation S510 described above, which will not be repeated here.
The identifying module 920 is used to identify a plurality of entity information from the text, based on the feature information. In one embodiment, the identifying module 920 may be used to implement the operation S520 described above, which will not be repeated here.
The second inputting module 930 is used to input a word feature matrix corresponding to the feature information into a deep learning model, so as to obtain a word relation tensor. In one embodiment, the second inputting module 930 may be used to implement the operation S530 described above, which will not be repeated here. The deep learning model is trained through the apparatus of training the deep learning model provided in the embodiment of the present disclosure.
The first generation module 940 is used to generate a plurality of relational triplets according to the word relation tensor and the plurality of entity information. In one embodiment, the first generation module 940 may be used to implement the operation S540 described above, which will not be repeated here.
The second generation module 950 is used to generate the knowledge graph by using the plurality of relational triplets. In one embodiment, the second generation module 950 may be used to implement the operation S550 described above, which will not be repeated here.
It should be noted that, in technical solutions of the present disclosure, a collection, a storage, a use, a processing, a transmission, a provision, a disclosure, an application and other processing of user personal information involved comply with provisions of relevant laws and regulations, and necessary confidentiality measures have been taken, and do not violate public order and good custom. In technical solutions of the present disclosure, the authorization or consent of the user is acquired before the user personal information is acquired or collected.
According to an embodiment of the present disclosure, the present disclosure further provides an electronic device, a readable storage medium, and a computer program product.
As shown in
Various components in the device 1000, including an input unit 1006 such as a keyboard, a mouse, etc., an output unit 1007 such as various types of displays, speakers, etc., a storage unit 1008 such as a magnetic disk, an optical disk, etc., and a communication unit 1009 such as a network card, a modem, a wireless communication transceiver, etc., are connected to the I/O interface 1005. The communication unit 1009 allows the device 1000 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunication networks.
The computing unit 1001 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 1001 include but are not limited to a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, a digital signal processor (DSP), and any appropriate processor, controller, microcontroller, and so on. The computing unit 1001 may perform the various methods and processes described above, such as at least one of the method of processing a text, the method of training a deep learning model, and the method of generating a knowledge graph. For example, in some embodiments, at least one of the method of processing a text, the method of training a deep learning model, and the method of generating a knowledge graph may be implemented as a computer software program that is tangibly contained on a machine-readable medium, such as a storage unit 1008. In some embodiments, part or all of a computer program may be loaded and/or installed on the device 1000 via the ROM 1002 and/or the communication unit 1009. When the computer program is loaded into the RAM 1003 and executed by the computing unit 1001, one or more steps of at least one of the method of processing a text, the method of training a deep learning model, and the method of generating a knowledge graph described above may be performed. Alternatively, in other embodiments, the computing unit 1001 may be used to perform at least one of the method of processing a text, the method of training a deep learning model, and the method of generating a knowledge graph in any other appropriate way (for example, by means of firmware).
Various embodiments of the systems and technologies described herein may be implemented in a digital electronic circuit system, an integrated circuit system, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), an application specific standard product (ASSP), a system on chip (SOC), a complex programmable logic device (CPLD), a computer hardware, firmware, software, and/or combinations thereof. These various embodiments may be implemented by one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor. The programmable processor may be a dedicated or general-purpose programmable processor, which may receive data and instructions from the storage system, the at least one input device and the at least one output device, and may transmit the data and instructions to the storage system, the at least one input device, and the at least one output device.
Program codes for implementing the method of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or a controller of a general-purpose computer, a special-purpose computer, or other programmable data processing devices such as at least one of the apparatus of training a deep learning model, the apparatus of determining heat exchange characteristic data, the control apparatus based on heat exchange characteristic data, and the annealing apparatus, so that when the program codes are executed by the processor or the controller, the functions/operations specified in the flowchart and/or block diagram may be implemented. The program codes may be executed completely on the machine, partly on the machine, partly on the machine and partly on the remote machine as an independent software package, or completely on the remote machine or the server.
In the context of the present disclosure, the machine readable medium may be a tangible medium that may contain or store programs for use by or in combination with an instruction execution system, device or apparatus. The machine readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine readable medium may include, but not be limited to, electronic, magnetic, optical, electromagnetic, infrared or semiconductor systems, devices or apparatuses, or any suitable combination of the above. More specific examples of the machine readable storage medium may include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, convenient compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
In order to provide interaction with users, the systems and techniques described here may be implemented on a computer including a display device (for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user), and a keyboard and a pointing device (for example, a mouse or a trackball) through which the user may provide the input to the computer. Other types of devices may also be used to provide interaction with users. For example, a feedback provided to the user may be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback), and the input from the user may be received in any form (including acoustic input, voice input or tactile input).
The systems and technologies described herein may be implemented in a computing system including back-end components (for example, a data server), or a computing system including middleware components (for example, an application server), or a computing system including front-end components (for example, a user computer having a graphical user interface or web browser through which the user may interact with the implementation of the system and technology described herein), or a computing system including any combination of such back-end components, middleware components or front-end components. The components of the system may be connected to each other by digital data communication (for example, a communication network) in any form or through any medium. Examples of the communication network include a local area network (LAN), a wide area network (WAN), and Internet.
The computer system may include a client and a server. The client and the server are generally far away from each other and usually interact through a communication network. The relation between the client and the server is generated through computer programs running on the corresponding computers and having a client-server relation with each other. The server may be the cloud server, also referred to as the cloud computing server or the cloud host, which is the host product in the cloud computing service system to solve shortcomings of difficult management and weak business scalability in a conventional physical host and a VPS (Virtual Private Server) service. The server may also be a server of a distributed system, or a server combined with a blockchain.
It should be understood that steps of the processes illustrated above may be reordered, added or deleted in various manners. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in a different order, as long as a desired result of the technical solution of the present disclosure may be achieved. This is not limited in the present disclosure.
The above-mentioned specific embodiments do not constitute a limitation on the scope of protection of the present disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations and substitutions may be made according to design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present disclosure shall be contained in the scope of protection of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202310802724.7 | Jun 2023 | CN | national |