This application claims priority from Chinese Patent Application No. 202110747494.X, filed on Jul. 1, 2021, the contents of which are incorporated by reference herein in its entirety.
The present disclosure generally relates to entity disambiguation technology, and more particularly, to a method and device for presenting prompt information regarding an entity in text to a user by utilizing the entity disambiguation technology, and a storage medium.
In a practical language environment, there is often a situation where an entity name corresponds to multiple concepts. For example, an entity name “apple” in text may refer to a kind of fruit or the Apple Inc. In order to solve such problem of ambiguity caused by the same entity name, entity disambiguation technology has been proposed and developed.
The entity disambiguation technology can link an entity mention (entity) in text to an appropriate concept in a knowledge graph, which plays a fundamental role in many domains such as question answering, semantic search, and information extraction. A concept indicates a matter or thing that is distinguishable and exists independently. The knowledge graph contains a large number of concepts, and different concepts may be related to one another. For example, in reality, there are two people with similar names, i.e., Michael Jeffrey Jordan who is a basketball star, and Michael Owen Jordan who is a professional in machine learning field. Accordingly, the knowledge graph may contain two concepts: “a basketball star Michael Jeffrey Jordan” and “a machine-learning expert Michael Owen Jordan”. In addition, there may be a number of sport concepts associated with “a basketball star Michael Jeffrey Jordan” and a number of computer science concepts associated with “a machine-learning expert Michael Owen Jordan” in the knowledge graph. When an entity “Michael Jordan” appears in text, it is required to determine, based on a context, whether the entity refers to “a basketball star Michael Jeffrey Jordan” or “a machine-learning expert Michael Owen Jordan” in the knowledge graph.
When it is determined, with the entity disambiguation technology, that the entity appearing in the text corresponds to a specific concept, prompt information based on the determined concept may be presented to a user who is reading the text, so that the user can immediately understand the entity correctly. For example, with regard to an entity “Apple” appearing in the text “Apple released a new mobile phone today . . . ” a prompt “Apple Inc.: an American high-tech company whose products include iPhones . . . ” may be provided to the user.
However, due to high complexity of natural language, the entity disambiguation technology faces a challenge of how to identify the correct meaning of an entity in a context and associate the entity with the correct concept in the knowledge graph.
At present, there are mainly two entity disambiguation methods. One is to model an entity disambiguation problem using a ranking model, and the other is to model an entity disambiguation problem using a classification model.
The method using the ranking model includes a step of generating candidate concepts and a step of sorting the candidate concepts. Simple rules are generally used in the step of generating candidate concepts. However, this often leads to failure in determining proper candidate concepts, and in turn generates cascading errors in the step of sorting.
The method using the classification model is to model the entity disambiguation problem as a single-label text classification task. For example, a model is described in “Medical concept normalization in social media posts with recurrent neural networks”, Elena Tutubalina et al., Journal of Biomedical informatics, June 2018, which model consists of a neural network part and an auxiliary feature part. In the neural network part, a Gated Recurrent Unit (GRU) network and an attention mechanism network are used to encode entities. In the auxiliary feature part, the model is enhanced with TF-IDF similarity and Word2Vec similarity.
However, the conventional entity disambiguation methods have the following problems:
the Word2Vec similarity features do not have sufficient semantic information, and therefore it is difficult to correctly determine a semantic-level similarity between an entity and a concept;
information of context of the entity is not used effectively;
an entity usually belongs to a category (such as disease, crop. etc.), the conventional methods however do not utilize category information of the entity.
In order to solve one or more problems in the conventional methods, a new method for entity disambiguation using a classification model is provided in the present disclosure, which adopts an advanced BERT model and utilizes context information of an entity, and additionally utilizes information about a category to which the entity belongs through multi-task learning. Therefore, meaning of the entity can be identified more accurately.
According to an aspect of the present disclosure, a computer-implemented method of presenting prompt information to a user who is viewing an electronic text by utilizing a neural network is provided. The electronic text includes an entity and context of the entity, and the entity and the context are each composed of one or more characters. The method includes: generating a mask vector for the entity, wherein the mask vector is used to identify a position of the entity in a statement composed of the entity and the context; generating a first vector and a second vector based on the entity and the context by a BERT layer in the neural network; generating a third vector based on the mask vector and the second vector by an entity average layer in the neural network; concatenating the first vector and the third vector by a concatenation layer in the neural network to generate a fourth vector; predicting which concept of a plurality of predefined concepts the entity corresponds to, based on the fourth vector by a first classifier in the neural network; predicting which type of a plurality of predefined types the entity corresponds to, based on the fourth vector by a second classifier in the neural network; jointly training the first classifier and the second classifier; and determining a concept to which the entity corresponds based on a prediction result of the trained first classifier, and generating the prompt information for presentation to the user based on the determined concept.
According to another aspect of the present disclosure, a device for presenting prompt information to a user who is viewing an electronic text by utilizing a neural network is provided. The electronic text includes an entity and context of the entity, and the entity and the context are each composed of one or more characters. The device includes a memory one or more processors. The memory stores a computer program. The processors are configured to execute the computer program to perform operations of: generating a mask vector for the entity, wherein the mask vector is used to identify a position of the entity in a statement composed of the entity and the context; generating a first vector and a second vector based on the entity and the context by a BERT layer in the neural network; generating a third vector based on the mask vector and the second vector by an entity average layer in the neural network; concatenating the first vector and the third vector by a concatenation layer in the neural network to generate a fourth vector; predicting which concept of a plurality of predefined concepts the entity corresponds to, based on the fourth vector by a first classifier in the neural network; predicting which type of a plurality of predefined types the entity corresponds to, based on the fourth vector by a second classifier in the neural network; jointly training the first classifier and the second classifier; and determining a concept to which the entity corresponds based on a prediction result of the trained first classifier, and generating the prompt information for presentation to the user based on the determined concept.
According to yet another aspect of the present disclosure, a device for presenting prompt information to a user who is viewing an electronic text by utilizing a neural network is provided. The electronic text includes an entity and context of the entity, and the entity and the context are each composed of one or more characters. The device includes a mask vector generation module, a BERT module, an entity average module, a concatenation module, a concept classification module, a type classification module, and a presentation module. The mask vector generation module is configured to generate a mask vector for the entity, wherein the mask vector is used to identify a position of the entity in a statement composed of the entity and the context. The BERT module is configured to generate a first vector and a second vector based on the entity and the context. The entity average module is configured to generate a third vector based on the mask vector and the second vector. The concatenation module is configured to concatenate the first vector and the third vector to generate a fourth vector. The concept classification module is configured to predict which concept of a plurality of predefined concepts the entity corresponds to, based on the fourth vector. The type classification module is configured to predict which type of a plurality of predefined types the entity corresponds to, based on the fourth vector. The concept classification module and the type classification module are jointly trained. The presentation module is configured to determine a concept to which the entity corresponds based on a prediction result of the trained concept classification module, and generate the prompt information for presentation to the user based on the determined concept.
According to yet another aspect of the present disclosure, a storage medium storing a computer program is provided. The computer program, when executed by a computer, causes the computer to perform the method as described above.
The entity and the context of the entity are inputted to the embedding layer 110.
The One-Hot encoding of the entity and the context shown in the first row in
BERT is an acronym for “Bidirectional Encoder Representation from Transformers”. The BERT layer can generate a semantic representation which contains rich semantic information for the text. In particular, the BERT layer can generate, for each word in the text, a vector representation which corresponds to the word and incorporates semantic information of the full text. For a specific task of natural language processing, the semantic representation produced by a pre-trained BERT layer may be fine-tuned to adapt to the specific task.
In the present disclosure, the BERT layer 120 generates a first vector V1 and a second vector V2 based on the input vector V0. The first vector V1 represents overall semantic information of the statement composed of the entity and the context, with a dimension of [batch_size, bert_dim]. The second vector V2 includes hidden vectors for respective words (the entity is one of the words) in the statement, with a dimension of [batch_size, sequence_length, bert_dim], where bert_dim represents a hidden layer dimension (i.e., output dimension) of the BERT layer 120.
The second vector V2 generated by the BERT layer 120 is inputted to the entity averaging layer 130. In addition, a mask vector Vm for the entity is inputted to the entity average layer 130. Generation of the mask vector Vm will be described below with reference to
As shown in the first row in
In particular, during the mapping process, when a predetermined length of the mask vector Vm, is greater than a total length of characters of the entity, the marker characters and the context, that is, when there is a position in the mask vector Vm which does not correspond to any one of the entity, the marker characters and the context, an element at such position is set to 0. On the other hand, when the predetermined length of the mask vector Vm is less than the total length of characters of the entity, the marker characters and the context, the context may be truncated such that the total length is equal to the length of the mask vector Vm, so as to enable the mapping process.
Referring to
The hidden vector for the entity includes hidden vectors for respective characters in the entity. The entity average layer 130 further calculates an average vector of the hidden vectors for respective characters on a character dimension, to obtain a third vector V3. The third vector V3 is inputted to the concatenation layer 140. A dimension of vector inputted to the entity average layer 130 is [batch_size, sequence_length, bert_dim], and a dimension of the output vector V3 is [batch_size, bert_dim].
The concatenation layer 140 receives the first vector V1 from the BERT layer 120 and the third vector V3 from the entity average layer 130. Since the first vector V1 and the third vector V3 each have the dimension of [batch_size, bert_dim], the concatenation layer 140 may concatenate the first vector V1 and the third vector V3 to generate a fourth vector V4. The fourth vector V4 has a dimension of [batch_size, bert_dim*2], and is inputted to the main classifier 150 and the auxiliary classifier 160.
The main classifier 150 predicts which concept in the knowledge graph the entity corresponds to, based on the fourth vector V4, and then outputs an identifier (ID) of the predicted concept. A dimension of the output concept ID is [batch_size, class_dim], where class_dim represents the number of concept IDs.
The auxiliary classifier 160 predicts which type of a plurality of predetermined types the entity belongs to, based on the fourth vector V4, and then outputs an ID of the predicted type. A dimension of the output type ID is [batch_size, type_dim], where type_dim represents the number of type IDs.
Similar to pre-establishing a knowledge graph containing a large number of concepts, a number of types may be defined in advance, such as “drug side effects”, “chemical raw materials”, and the like. A type may cover a wider range than a concept. For example, an entity “Yao Ming” may correspond to a concept of “a basketball player Yao Ming” and a type of “athlete”.
In the present disclosure, the main classifier 150 and the auxiliary classifier 160 may be jointly trained. A training process will be described in detail below.
In the training process, a training dataset is inputted to the neural network as shown in
In addition, a first loss function F1 is established for the main classifier 150, and a second loss function F2 is established for the auxiliary classifier 160. A weighted sum of the first loss function and the second loss function is calculated to obtain a final loss function F. The main classifier 150 and the auxiliary classifier 160 are jointly trained based on the loss function F. In a preferred embodiment, both the first loss function F1 and the second loss function F2 are cross-entropy loss functions. Furthermore, since the classification task of predicting concepts is the main task in the present disclosure, it is preferable to apply a greater weight to the first loss function F1 than the second loss function F2 in calculating the weighted sum.
During the training process, the task of the auxiliary classifier 160 is associated to the task of the main classifier 150. The prediction result of the auxiliary classifier 160 is a certain type which has a coarser granularity, while the prediction result of the main classifier 150 is a certain concept which has a finer granularity. There is a hierarchical relationship between the prediction results of the two classifiers. For example, for the entity “Yao Ming”, the auxiliary classifier 160 may output the type of “athlete”, and the main classifier 150 may output the concept of “a basketball player Yao Ming”. Since jointly learning on multiple tasks may achieve better effects than learning on a single task, provision of the auxiliary classifier 160 may help the main classifier 150 to improve the learning effect.
When the training is completed, it is possible to, in practical applications, use the main classifier 150 to predict a concept to which an entity in the text corresponds, and then generate prompt information for a user based on the concept, and thereby help the user who is reading the text to understand the entity correctly. The prompt information may be provided in various manners (for example, visually or audibly). For example, when the user is viewing a document, the meaning of the entity may be presented to the user by a hyperlink or a pop-up window, or in an audible way.
In particular, there may be a situation in the practical applications where an entity does not correspond to any one of concepts in the knowledge graph, that is, none of the concepts is suitable to explain the entity. In this case, a threshold may be set. For each of the concepts in the knowledge graph, the trained main classifier 150 may output a probability that the entity corresponds to the concept. When the largest probability among predicted probabilities is greater than the threshold, it is determined that the concept associated with the largest probability corresponds to the entity, and then the prompt information for the user may be generated based on the determined concept. On the other hand, if all the predicted probabilities are less than the threshold, it means that the entity is not suitable to be classified into any of the concepts. In this case, no prompt information is to be generated for the entity.
Referring to
In step S410, a mask vector Vm (as shown in
In step S420, a first vector V1 and a second vector V2 are generated by the BERT module 330 based on the vector V0 received from the embedding module 320. The first vector V1 represents overall semantic information of a statement composed of the entity and the context, and the second vector V2 includes hidden vectors for respective words in the statement.
In step S430, a third vector V3 is generated by the entity average module 340 based on the mask vector Vm generated by the mask vector generation module 310 and the second vector V2 generated by the BERT module 330. Specifically, the entity average module 340 multiplies the second vector V2 and the mask vector Vm to obtain a hidden vector for the entity, and the hidden vector for the entity includes hidden vectors for respective characters in the entity. Then, the entity average module 340 calculates an average vector of the hidden vectors for respective characters, as the third vector V3.
In step S440, the first vector V1 generated by the BERT module 330 and the third vector V3 generated by the entity average module 340 are concatenated by the concatenation module 350 to generate a fourth vector V4.
In step S450, the concept classification module 360 performs the concept classification task based on the fourth vector V4 generated by the concatenation module 350. For each of a number of predefined concepts, the concept classification module 360 outputs a probability that the entity corresponds to the concept. It is possible to predict which concept the entity corresponds to, based on the output probabilities.
In step S460, the type classification module 370 performs the type classification task based on the fourth vector V4 generated by the concatenation module 350. For each of a number of predefined types, the type classification module 370 outputs a probability that the entity corresponds to the type. It is possible to predict which type the entity corresponds to, based on the output probabilities.
In step S470, the concept classification module 360 and the type classification module 370 are jointly trained based on the loss function F. Specifically, the loss function F is a weighted sum of a first loss function F1 for the concept classification module 360 and a second loss function F2 for the type classification module 370. In a preferred example, a weight applied to the first loss function F1 may be greater than a weight applied to the second loss function F2.
In step S480, a concept to which the entity corresponds is predicted by using the trained concept classification module 360. Prompt information for presentation to a user is generated by the presentation module 380 based on the predicted concept, to help the user understand the entity correctly. The presentation module 380 may present the prompt information in the visual and/or audible manner.
It should be noted that the method according to the present disclosure does not have to be performed in the order shown in
The neural network model and the method of the present disclosure have been described in detail with reference to the embodiments. The present disclosure provides a new entity disambiguation scheme, in which the advanced BERT model is adopted, context information of an entity is utilized, and information on a type to which the entity belongs is effectively utilized through multi-task learning. As such, performance of the neural network model may be improved, a possibility of correctly identifying the meaning of the entity is increased, and more accurate prompt information may be provided to the user.
The method described in the embodiments may be implemented by software, hardware, or a combination of software and hardware. Programs included in the software may be stored in advance in a storage medium provided inside or outside the device. As an example, during execution, these programs are written to a random access memory (RAM) and executed by a processor (such as a CPU), implementing the methods and processes described herein.
As shown in
An input/output interface 505 is further connected to the bus 504. The input/output interface 505 is connected with the following components: an input unit 506 configured with a keyboard, a mouse, a microphone, and the like; an output unit 507 configured with a display, a speaker, and the like; a. storage unit 508 configured with a hard disk, a non-volatile memory, and the like; a communication unit 509 formed by a network interface card, such as a local area network (LAN) card, a modem, and the like; and a drive 510 that drives a removable medium 511, where the removable medium 511 is a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, for example.
In the computer having such structure, the CPU 501 loads a program stored in the storage unit 508 into the RAM 503 via the input/output interface 505 and the bus 504, and executes the program so as to execute the above-described method.
The program to be executed by the computer (CPU 501) may be recorded on a removable medium 511. The removable medium 511 serves as a package medium, which is formed by, for example, a magnetic disk (including a floppy disk), an optical disk (including a compact disk-read only memory (CD)-ROM)), a digital versatile disc (DVD), a magneto-optical disc, or a semiconductor memory. Furthermore, the program to be executed by the computer (CPU 501) can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.
When the removable medium 511 is installed in the drive 510, the program may be installed in the storage unit 508 via the input/output interface 505. In addition, the program may be received by the communication unit 509 via a wired or wireless transmission medium and may be installed in the storage unit 508. Alternatively, the program may be pre-installed in the ROM 502 or the storage unit 508.
The program executed by the computer may be a program that performs processes according to the order described in this specification, or may be a program that performs processes in parallel or when necessary (for example, when invoked).
The units or means described herein are only in a logical sense and do not strictly correspond to physical devices or physical entities. For example, functions of a unit described herein may be implemented by multiple physical entities, or functions of multiple units described herein may he implemented by a single physical entity. Furthermore, features, components, elements, steps, and the like, described in an embodiment are not limited to that embodiment, but may also be applied to other embodiments, for example, to replace certain features, components, elements, steps, and the like, in the other embodiments, or to combine with them.
The scope of the present disclosure is not limited to the specific embodiments described herein. It should be understood by those of ordinary skill in the art that, depending upon design requirements and other factors, various modifications or variations of the embodiments herein may be made without departing from the principles and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims and their equivalents.
(1) A computer-implemented method of presenting prompt information to a user who is viewing an electronic text by utilizing a neural network, wherein the electronic text includes an entity and context of the entity, and the entity and the context are each composed of one or more characters,
the method comprising:
generating a mask vector (Vm) for the entity, the mask vector being used to identify a position of the entity in a statement composed of the entity and the context;
generating a first vector (V1) and a second vector (V2) based on the entity and the context, by a BERT layer in the neural network;
generating a third vector (V3) based on the mask vector (Vm) and the second vector (V2), by an entity average layer in the neural network;
concatenating the first vector (V1) and the third vector (V3) by a concatenation layer in the neural network to generate a fourth vector (V4);
predicting which concept of a plurality of predefined concepts the entity corresponds to, based on the fourth vector (V4), by a first classifier in the neural network;
predicting which type of a plurality of predefined types the entity corresponds to, based on the fourth vector (V4), by a second classifier in the neural network;
jointly training the first classifier and the second classifier; and
determining a concept to which the entity corresponds based on a prediction result of the trained first classifier, and generating the prompt information for presentation to the user based on the determined concept.
(2) The method according to (1), wherein generating the mask vector (Vm) for the entity further comprises:
adding a marker character between a first character of the entity and the preceding text, and adding the marker character between a last character of the entity and the following text;
mapping the entity, the marker characters and the context to a vector with a fixed length on a character basis, to generate the mask vector (Vm),
wherein in the mask vector (Vm), elements at positions corresponding to the characters of the entity and the marker characters are set to 1, and elements at positions corresponding to the characters of the context are set to 0.
(3) The method according to (2), wherein when the length of the vector is greater than a total length of characters of the entity, the marker characters and the context, elements at positions not corresponding to the entity, the marker characters and the context are set to 0, and
wherein when the length of the vector is less than the total length of characters of the entity, the marker characters and the context, the context is truncated such that the total length is equal to the length of the vector.
(4) The method according to (1). wherein the first vector (V1) represents overall semantic information of the statement composed of the entity and the context, and the second vector (V2) includes hidden vectors for respective words in the statement.
(5) The method according to (4), wherein generating the third vector (V3) further comprises:
multiplying the second vector (V2) and the mask vector (Vm) to obtain a hidden vector for the entity, wherein the hidden vector for the entity includes hidden vectors for respective characters in the entity;
calculating, on the character dimension, an average vector of the hidden vectors for the respective characters in the entity, as the third vector (V3).
(6) The method according to (1), wherein jointly training the first classifier and the second classifier further comprises:
establishing a first loss function for the first classifier and a second loss function for the second classifier;
performing a weighted addition to the first loss function and the second loss function, wherein a weight for the first loss function is greater than a weight for the second loss function; and
training the first classifier and the second classifier in such a manner that the resulted loss function after the weighted addition is minimized.
(7) The method according to (6), wherein each of the first loss function and the second loss function is a cross-entropy loss function.
(8) The method according to (1), further comprising:
for each of the plurality of concepts, predicting a probability that the entity corresponds to the concept by the trained first classifier;
if a maximum probability among the predicted probabilities is greater than a predetermined threshold, determining that a concept associated with the maximum probability corresponds to the entity; and
if all the predicted probabilities are less than the predetermined threshold, generating no prompt information for the entity.
(9) The method according to (1), wherein the prompt information is presented to the user in at least one of a visual manner and an auditory manner.
(10) A device for presenting prompt information to a user who is viewing an electronic text by utilizing a neural network, wherein the electronic text includes an entity and context of the entity, and the entity and the context are each composed of one or more characters,
the device comprising:
a memory storing a computer program; and
one or more processors executing the computer program to perform operations of:
generating a mask vector (Vm) for the entity, the mask vector being used to identify a position of the entity in a statement composed of the entity and the context;
generating a first vector (V1) and a second vector (V2) based on the entity and the context, by a BERT layer in the neural network;
generating a third vector (V3) based on the mask vector (Vm) and the second vector (V2), by an entity average layer in the neural network;
concatenating the first vector (V1) and the third vector (V3) by a concatenation layer in the neural network to generate a fourth vector (V4);
predicting which concept of a plurality of predefined concepts the entity corresponds to, based on the fourth vector (V4), by a first classifier in the neural network;
predicting which type of a plurality of predefined types the entity corresponds to, based on the fourth vector (V4), by a second classifier in the neural network;
jointly training the first classifier and the second classifier; and
determining a concept to which the entity corresponds based on a prediction result of the trained first classifier, and generating the prompt information for presentation to the user based on the determined concept.
(11) A storage medium storing a computer program that, when executed by a computer, causes the computer to perform the method of presenting prompt information to a user according to any one of (1) to (9).
Number | Date | Country | Kind |
---|---|---|---|
202110747494.X | Jul 2021 | CN | national |