This application is based upon and claims priority to Chinese Patent Application No. 202111202116.X, filed on Oct. 15, 2021, the entire contents of which are incorporated herein by reference.
The present disclosure relates to a method for extracting a dam emergency event based on a dual attention mechanism and belongs to the technical field of natural language processing (NLP).
In the field of water conservation engineering, dams that integrate flood control, water storage, power generation, and other functions will encounter many natural risks during long-term operations, such as earthquakes, floods, and rainstorms. After these natural hazards, comprehensive special inspections are required, which are an important measure for dam maintenance. In addition, daily inspection and maintenance are also important measures to ensure the safety of the dam. Over the years, the safety operation records of the dam in the emergency working state have produced numerous lengthy special inspection reports and daily inspection reports. Mining useful information from massive unstructured text data is still a difficult problem. In this context, information extraction research came into being. Event extraction is one of the most challenging tasks in information extraction research. In the information age, relying solely on manual labor to identify and organize event arguments into structured data is time-consuming and labor-intensive. Therefore, the automatic extraction of dam emergency events is of great importance.
In the study of event extraction, it is found that existing deep learning networks (DLNs), such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), are used to generate low-dimensional vectors to automatically represent textual semantic information and further extract event arguments based on semantic vectors. Although DLNs can automatically learn low-level features, they do not fully utilize the syntactic relations and are prone to missing argument roles because event information is often scattered across multiple sentences of a document.
The objective of the Present Disclosure: To overcome the problems existing in the prior art, the present disclosure introduces a dependency and proposes a method for extracting a dam emergency event based on a dual attention mechanism. The present disclosure stores and represents the information in the special inspection report and daily inspection report of a dam in a structured form for users to query and researchers to analyze. The present disclosure greatly improves the efficiency of mining important information from unstructured information.
Technical Solutions: A method for extracting a dam emergency event based on a dual attention mechanism, which can mine a syntactic relation based on a graph transformer attention network (GTAN) and an attention network, and extract and fill an event argument role based on a dam emergency corpus, includes the following steps:
Further, a dam emergency refers to a working state of a dam in case of a natural disaster.
Further, the dam emergency corpus includes special inspection reports and daily inspection reports of a dam over the years.
Further, in step (1), the performing data preprocessing specifically includes labeling data of a special inspection report and a daily inspection report of a dam in a blocking I/O (BIO) mode, taking a 312-dimensional vector of the last layer of an ALBERT model as a word embedding vector, and concatenating an event type embedding vector, an entity type embedding vector, and a part-of-speech tag embedding vector; mining the concatenated embedding vectors through a bidirectional long short-term memory (BiLSTM) network to acquire hidden vectors H=h1, . . . , hh. The event type embedding vector is a mathematical vector corresponding to typical events, such as earthquakes, heavy rain, flood discharge, pre-flood safety inspection, comprehensive special inspection, daily maintenance, and daily inspection. The entity type embedding vector is a mathematical vector corresponding to a person's name, organization, location, time, date, numerical value, percentage, etc. The part-of-speech tag embedding vector is a mathematical vector corresponding to a noun, verb, adjective, quantifier, pronoun, etc.
Further, in step (2), building a dependency graph specifically includes: building an adjacency matrix Ad of a dependency tree and a dependency label score matrix Ádl according to a word relation in the dam emergency corpus; calculating a score between the hidden vectors hi and hj acquired in step (1) to acquire a semantic score matrix As; and concatenating Ad, Ãdl and As to acquire a dependency graph matrix A=[Ad, Ãdl, As].
Further, in step (3), the building the dual attention network specifically includes: proposing the GTAN, replacing a graph convolutional network (GCN) by a graph attention network (GAN), and performing a reasonable weight distribution, where the GTAN is an improvement of a graph transformer network (GTN), and the GCN in the GTN is replaced by the GAN (it is reasonable to give a higher weight to a trigger and an arc of a key argument in the dependency, which can give full play to an effect of the dependency); performing, by the GTAN, a 1×1 convolution to an adjacency matrix A set through a graph transformer layer, and generating a new meta-path graph Al (a new dependency arc) through a matrix multiplication; applying, by a graph attention layer, the GAN to each channel of the meta-path graph Al, and concatenating multiple node representations as a Z vector; calculating a weight matrix αa of the attention network layer, multiplying αa point by a hidden vector H to generate a vector {tilde over (H)}, and connecting, by a hyperparameter λ, the Z vector generated by the GTAN layer and the {tilde over (H)} vector generated by the attention network layer to acquire a fused vector W:
{tilde over (W)}=σ(λ·Z+(1−λ)·{tilde over (H)})
where σ is a sigmoid function. Finally, the event extraction is carried out using sequence labeling, and the feature fused vector {tilde over (H)} is mined by a conditional random field (CRF) to predict a label of each character. The problem of unbalanced samples caused by redundant useless information O is solved by a Focal loss function and an Adam optimizer.
Further, in step (4), the filling a document-level argument specifically includes: concatenating four embedding vectors of a special inspection report and a daily inspection report of a dam, namely, argument label, entity type, sentence information, and document information; building a text convolutional neural network (textCNN), taking the concatenated vectors as an input vector, detecting a key sentence regarding an event, and determining a key event; and calculating, by the twin neural network based on a Manhattan LSTM network, a semantic similarity between sentences, and filling the argument role.
A system for extracting a dam emergency event based on a dual attention mechanism includes:
The specific implementation of the system is the same as that of the method.
A computer device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method for extracting a dam emergency event based on a dual attention mechanism.
A computer-readable storage medium stores a computer program for implementing the method for extracting a dam emergency event based on a dual attention mechanism.
Beneficial Effects: Compared with the prior art, the present disclosure has the following advantages. The present disclosure extracts a useful event from unstructured information, such as the special inspection report and daily inspection report of the dam, and presents the useful event in a structured form. This improves the ability to retrieve information and saves labor costs. The present disclosure extracts features from text data of a dam emergency, builds the word embedding vector, and captures important contextual information through the BiLSTM network, thereby improving the model prediction capability. The present disclosure generates the new dependency arc through the GTAN, aggregates the node information, captures the long-range dependency and potential interaction, weighs and fuses the attention network, captures the key semantic information in the sentence, and extracts the sentence-level event argument. This improves the performance of event argument role extraction. The present disclosure incorporates event type information and identifies the argument in the sentence with multiple events by event type, which solves the problems of overlapping roles and missing arguments and improves the accuracy of argument classification.
The present disclosure will be further explained in conjunction with the specific embodiments. It should be understood that these embodiments are intended to illustrate the present disclosure rather than limit the scope of the present disclosure. Various equivalent modifications to the present disclosure made by those skilled in the art after reading the specification should fall within the scope defined by the appended claims.
As shown in
(1) Data preprocessing is performed: A dam emergency corpus is labeled, and sentences and a document with information on a dam emergency event are encoded based on four embedding vectors.
(1.1) The data of a special inspection report and a daily inspection report of a dam in a BIO mode are labeled, that is, each element is labeled as B-X, I-X or O. B-X denotes a beginning part of a key argument belonging to a type X, I-X denotes an intermediate part of the key argument belonging to the type X, and O denotes other words in the sentence except the key argument.
For example, the Chinese sentence “On Aug. 13, 2018, an M5.0 earthquake occurred in Tonghai County, Yuxi City, Yunnan, with a focal depth of 7 kilometers, and the straight-line distance from the epicenter of the earthquake to the dam of the Manwan Hydropower Station is about 231 kilometers.” is labeled in the BIO mode as follows: OnO AugustB-Time 13I-Time,O 2I-Time0I-Time1I-Time8I-Time,O anO MB-Magnitude5I-Magnitude.I-Magnitude0I-Magnitude earthquakeO occurredO inO TongB-PlacehaiI-Place CountyI-Place,O YuI-PlacexiI-Place CityI-Place,O YunI-PlacenanI-Place,O withO aO focalO depthO ofO 7B-Depth kilometersI-Depth,O andO theO straight-lineO distanceO fromO theO epicenterO ofO theO earthquakeO toO theO damB-Place ofI-Place theI-Place ManI-PlacewanI-Place HydropowerI-Place StationI-Place isO aboutO 231B-Range kilometersI-Range.O
(1.2) A sentence is given a length of n, that is, W=w1, w2, . . . , wn.
(1.3) A 312-dimensional vector of the last layer of an ALBERT model is taken as a word embedding vector, where an event type embedding vector, an entity type embedding vector, and a part-of-speech tag embedding vector are generated through a trainable lookup table.
(1.4) The word embedding vector, the event type embedding vector, the entity type embedding vector, and the part-of-speech tag embedding vector are concatenated. The concatenated embedding vectors are mined through BiLSTM to capture important contextual information. A sequence of hidden vectors H=h1, . . . , hh for representation is acquired at the next step.
(2) A dependency graph is built: A dependency is introduced, and a dependency graph is built based on a sentence structure and a semantic structure to identify and classify all parameters of the dam emergency event.
(2.1) An adjacency matrix Ad of a dependency tree is taken as one of the syntactic structures of event extraction, where Ad is a binary matrix of N×N. If the words wi and wj are linked in the dependency tree, Ad(i,j) has a value of 1. If not, it has a value of 0.
(2.2) A matrix Adl is initialized according to a dependency label. A P-dimensional embedding vector of r is found from a trainable embedding lookup table by Adl(i,j). If there is a dependency edge between the words wi and wj, the dependency label is r. Otherwise, Adl(i,j) is initialized by a P-dimensional all-zero vector.
(2.3) The dependency label matrix Adl is transformed to a dependency label score matrix Ãdl:
where U is a trainable weight matrix.
(2.4) A score between the hidden vectors hi and hj is calculated to acquire a semantic score matrix As:
ki=Ukhi,qi=Uqhi,
where Uk and Uq are trainable weight matrices.
(2.5) A dependency graph matrix A=[Ad, Ãdl, As] is acquired, where Ad is the adjacency matrix of the dependency tree, Ãdl is the dependency label score matrix, and As is the semantic score matrix.
(3) The dual attention network is built: A new dependency arc is generated based on the GTAN, and node information is aggregated to capture a long-range dependency and a potential interaction. The attention network is introduced. Features extracted by a GTAN layer and an attention network layer are fused according to a set ratio. Key semantic information in the sentence is captured, and a sentence-level event argument is extracted.
(3.1) The GTAN is proposed, a graph convolutional network (GCN) is replaced by a graph attention network (GAN), and a reasonable weight distribution is performed. The vector generated by the attention layer goes through a Dropout layer to prevent the model from overfitting. The GTAN is an improvement of a graph transformer network (GTN), and the GCN in the GTN is replaced by the GAN. It is reasonable to give a higher weight to a trigger and an arc of a key argument in the dependency, which can give full play to the effect of the dependency.
(3.2) The GTAN is formed by two parts: A graph transformer layer and a graph attention layer. A 1×1 convolution is applied on the adjacency matrix A set by the graph transformer layer. Two intermediate adjacency matrices Q1 and Q2 in the matrix vector are softly selected after the 1×1 convolution, and the matrices Q1 and Q2 are multiplied to generate a new meta-path graph Al.
(3.3) A graph attention network (GAN) is applied to each channel of the meta-path graph Al by the graph attention layer, and multiple node representations are concatenated as Z:
where ∥ is a join operator, C is a number of channels, Ãi(l) (Ãi(l)=Ai(l)+l) is the adjacency matrix of an i-th channel of Al, {tilde over (D)}i is a degree matrix of Ãi(l), V is a trainable weight matrix shared across channels, X is a feature matrix, and I is an identity matrix.
(3.4) A weight matrix αak of the attention network layer is calculated according to the following formula:
αak=softmax(tan h(WaThk+bk))
where hk is a k-th vector in the hidden vector H generated through BiLSTM, Wa is a trainable weight matrix, and bk is a bias.
(3.5) A αa point of the weight matrix of the attention network layer is multiplied by a hidden vector H to generate a vector {tilde over (H)}, and the Z vector generated by the GTAN layer and the {tilde over (H)} vector generated by the attention network layer are connected by a hyperparameter λ to acquire a fused vector {tilde over (W)}:
{tilde over (W)}=σ(λ·Z+(1−λ)·{tilde over (H)})
where σ is a sigmoid function.
(3.6) The feature fused vector {tilde over (W)} is fined by a conditional random field (CRF) to predict a label of each character.
(4) A document-level argument is filled: A sentence with a key event in the dam emergency document is detected, and an argument role with the highest similarity is filled in a surrounding sentence to a key event missing part through a twin neural network.
(4.1) An initial vector of the event argument label is set in the form of a one-hot label to be formed by 1 and 0, 1 denoting a key argument position and 0 denoting other positions. A randomly generated initial vector is trained into a 128-dimensional embedding vector by Word2vec.
(4.2) An entity type is generated by looking up a randomly initialized embedding table, and the embedding vector is set to be 128-dimensional.
(4.3) The sentence information and document information are transformed into 312-dimensional embedding vectors respectively through ALBERT.
(4.4) The four embedding vectors (i.e., argument label, entity type, sentence information, and document information) are concatenated to generate an 880-dimensional new vector.
(4.5) A text convolutional neural network (textCNN) is established, as shown in
(4.6) A <key sentence, adjacent sentence> pair is processed by the twin neural network based on a Manhattan LSTM network shown in
To verify the validity of the model of the present disclosure, an experiment was carried out based on the dam emergency corpus. The case available in the corpus is shown in Table 1, and the event types and corresponding event arguments are shown in Table 2. The evaluation criteria used in the experiment are P, R, and F1, where P denotes a precision rate, R denotes a recall rate, and F1 is a comprehensive evaluation criterion for evaluating general classification problems. The event extraction models involved in the comparison experiment include a deep multi-pooling convolutional neural network (DMCNN) model, a convolutional bidirectional long short-term memory (C-BiLSTM) model, a joint recurrent neural network (JRNN) model, a hierarchical modular event argument extraction (HMEAE) model, and a joint multiple Chinese event extractor (JMCEE) model. The DMCNN model uses dynamic multi-pooling layers for event extraction based on event triggers and arguments. The C-BiLSTM model performs Chinese event extraction from the perspective of the character-level sequence labeling paradigm. The JRNN model performs event extraction through joint event extraction via a recurrent neural network. The HMEAE model designs a neural module network based on a concept level for each basic unit and forms a role-oriented module network through logical operations to classify specific argument roles. The JMCEE model jointly performs predictions on event triggers and event arguments based on a shared feature representation of a pre-trained language model.
Table 3 shows the comparison results between the model of the embodiment of the present disclosure and the five models DMCNN, C-BiLSTM, JRNN, HMEAE, and JMCEE. The results show that the model of the embodiment of the present disclosure makes full use of the syntactic relation and semantic structure and has a better event extraction effect based on the dam emergency corpus than the five models.
A system for extracting a dam emergency event based on a dual attention mechanism includes:
The specific implementation of the system is the same as that of the method.
Obviously, a person skilled in the art should know that the steps or modules of the embodiments of the present disclosure may be implemented by a universal computing apparatus. These modules or steps may be concentrated on a single computing apparatus or distributed on a network consisting of a plurality of computing apparatuses and may optionally be implemented by programmable code that can be executed by the computing apparatuses. These modules or steps may be stored in a storage apparatus for execution by the computing apparatuses and may be implemented, in some cases, by performing the shown or described steps in sequences different from those described herein, or making the steps into integrated circuit modules respectively, or making multiple modules or steps therein into a single integrated circuit module. In this case, the embodiments of the present disclosure are not limited to any specific combination of hardware and software.
Number | Date | Country | Kind |
---|---|---|---|
202111202116.X | Oct 2021 | CN | national |
Number | Name | Date | Kind |
---|---|---|---|
8725732 | Jeh | May 2014 | B1 |
20090313243 | Buitelaar et al. | Dec 2009 | A1 |
20170337474 | Li | Nov 2017 | A1 |
20180336183 | Lee | Nov 2018 | A1 |
20200089652 | Jayaraman | Mar 2020 | A1 |
20210089765 | Ling | Mar 2021 | A1 |
20210390261 | Jiang | Dec 2021 | A1 |
20220405524 | Yuan | Dec 2022 | A1 |
Number | Date | Country |
---|---|---|
111680159 | Sep 2020 | CN |
113312500 | Aug 2021 | CN |
Number | Date | Country | |
---|---|---|---|
20230119211 A1 | Apr 2023 | US |