This application claims the priority of the Chinese patent application filed on May 28, 2013 with the application No. 201310203987.2 and the title “Grammar compiling methods, semantic parsing methods and corresponding devices”.
The present disclosure relates to the field of computer technology, in particular to grammar compiling methods, semantic parsing methods, devices, computer storage media and apparatuses.
Speech recognition and semantic parsing of spoken language are two important techniques for speech interaction products. Speech recognition is to convert speech into words. Semantic parsing of spoken language is to understand information carried by voices from spoken language. The accuracy of speech recognition and semantic parsing of spoken language has direct effect on user experiences. The main technique adopted to improve the accuracy of speech recognition and semantic parsing of spoken language is two context-free grammar protocols by W3C (World Wide Web Consortium) to build recognition space, namely, using the grammar in the form of BNF or grammar based on XML (Extensible Markup Language).
People talk with natural spoken language, behavior and wording of which are typically significantly different from text input. For example, the spoken language may be characterized by loose grammar and inverse word order. The BNF grammar recommended by W3C or grammar of XML format are used by mainstream large vocabulary continuous speech recognition devices. Nevertheless, because of the deep parsing level of BNF and XML, semantic mapping and syntactic understanding using these two grammars are extremely complicated, with poor readability and maintainability and it is difficult to do relevant grammar compiling and semantic parsing.
In view of this, the present disclosure provides embodiments of methods for grammar compiling, methods for semantic parsing and corresponding devices, so as to facilitate readability and maintainability.
In an embodiment, a grammar compiling method defines (or pre-defines) a corresponding grammar description file and a corresponding word category description file based on a logical grammar by manifest language LGML according to a common sentence expression of a semantic meaning, in the grammar description file, the description for a common sentence is composed by operators, word categories, and functions, the word category description file is used to describe specific values for the word categories;
According to an embodiment, the word category description file comprises word items, or comprises, besides word items, at least one of the operators and functions to describe relationships between word items.
According to an embodiment, the operators include at least one of the following:
an operator +, signifying two or more operands put in series;
an operator |, signifying a parallel relationship between two or more operands;
an operator ( ), indicating operands forming a non-negligible combination;
an operator [ ], indicating operands forming a negligible combination;
an operator ;, indicating the end of a sentence;
an operator : indicating explanation of a word category in the word category description file;
an operator “ ”, indicating a reference to an external dictionary.
According to an embodiment, the functions in the grammar description file include at one of the following:
a function &repeat (EXP, min, max), signifying repeating a grammar portion EXP at least min times, and at most max times;
a function &repeat (EXP, n), signifying repeating EXP n times;
a function &perm (EXP1, EXP2, . . . ), signifying making a full permutation of EXP1, EXP2, . . . ;
a function &grammar (grammar_name), indicating a grammar description file with the name of “grammar_name”;
a function &magic (EXP, key, default, display) or a function &magic (EXP, key, default), mapping EXP to the semantic tag “the “key””, wherein for the function &magic (EXP, key, default) in a grammar matching process, when EXP is successfully matched with a section T of a text, the “key” has a value of T, otherwise, the “key” has the value of “default”, and for the function &magic (EXP, key, default, display) in a grammar matching process, when EXP is successfully matched with a section T of text, the value of the “key” is the value of “display”, otherwise the “key” has a value of “default”.
According to an embodiment, the functions in the word category description file include at least one of the following:
a function &repeat (EXP, min, max), signifying repeating a grammar portion EXP at least min times, and at most max times;
a function &repeat (EXP, n), signifying repeating EXP n times.
a function &perm (EXP1, EXP2, . . . ), signifying making a full permutation of EXP1, EXP2, . . . .
According to an embodiment, on the grammar tree of the semantic meaning, leaf nodes are word categories or referenced external dictionary, non-leaf nodes are operators or function names, the operands of non-leaf nodes are the contents of respective sub-trees of the non-leaf nodes.
According to an embodiment, the function &magic (EXP, key, default, display) or the function &magic (EXP, key, default) in the grammar tree of the semantic meaning are marked as non-leaf nodes, and there exist corresponding mapping tables for the function &magic (EXP, key, default, display) or the function &magic (EXP, key, default).
In an embodiment, a semantic parsing method, comprises:
Matching a text to be parsed according to a defined sequence (which may be pre-defined) onto a grammar tree obtained with the aforementioned grammar compiling methods, wherein if the text to be parsed is completely matched with the grammar tree, it is determined that the corresponding semantic meaning of the grammar tree is the semantic meaning of the text to be parsed.
According to an embodiment, during the matching process, if a section of the text to be parsed is matched with a sub-tree marked by the function &magic (EXP, key, default, display) or the function &magic (EXP, key, default), in the parsing result, the value of the “key” in the corresponding mapping table of the function &magic (EXP, key, default, display) or the function &magic (EXP, key, default) is the section of the text or the value of the “display” in the mapping table.
In an embodiment, a semantic parsing method comprises:
carrying out a forward maximum matching on a text to be parsed according to a defined sequence (which may be pre-defined) onto a grammar tree obtained with the aforementioned grammar compiling method, wherein, if the text to be parsed has a section that matches a sub-tree marked by a function &magic (EXP, key, default, display) or a function &magic (EXP, key, default) in the grammar tree, it is determined that the corresponding semantic meaning of the grammar tree is the semantic meaning of the text to be parsed.
According to an embodiment, in the parsing result, the value of the “key” in the corresponding mapping table of the matched function &magic (EXP, key, default, display) or function &magic (EXP, key, default) is the section of the text or the value of “display” in the mapping table.
According to an embodiment, the corresponding semantic meaning of the grammar tree is determined to be the semantic meaning of the text to be parsed only if a section of the text is matched with a sub-tree of a critical function &magic (EXP, key, default, display) or a critical function &magic (EXP, key, default) defined (or pre-defined) on the grammar tree.
In an embodiment, a semantic parsing method comprises:
S1, matching a text to be parsed onto a grammar tree obtained with the aforementioned grammar compiling method according to a defined sequence (which may be pre-defined), wherein if the text to be parsed is completely matched with the grammar tree, it is determined that the corresponding semantic meaning of the grammar tree is the semantic meaning of the text to be parsed, otherwise, step S2 is executed;
S2, carrying out a forward maximum matching on the text to be parsed onto the grammar tree according to the defined sequence, wherein, if the text to be parsed has a section that matches a sub-tree marked by a function &magic (EXP, key, default, display) or a function &magic (EXP, key, default) in the grammar tree, it is determined that the corresponding semantic meaning of grammar tree is the semantic meaning of the text to be parsed.
According to an embodiment, in the parsing result, the value of the “key” in the corresponding mapping table of the matched function &magic, (EXP, key, default, display) or function &magic (EXP, key, default) is the section of the text or the value of “display” in the mapping table.
In an embodiment, a grammar compiling device comprises:
a file storage unit, used to store a corresponding grammar description file and a corresponding word category description file for a semantic meaning, wherein the grammar description file and the corresponding word category description file are defined (or pre-defined) based on a logical grammar by manifest language LGML according to a common sentence expression of the semantic meaning, in the grammar description file, the description for a common sentence is composed by operators, word categories, and functions, the word category description file is used to describe specific values for the word categories;
and a grammar tree generation unit, for respectively generating the grammar description file and the word category file into a grammar tree for the grammar description file and word category trees for the word category file with a reduction method of a defined sequence (which may be pre-defined), and grafting the word category trees to the positions of the corresponding word categories on the grammar tree, forming a grammar tree for the semantic meaning.
According to an embodiment, the word category description file comprises word items, or comprises, besides word items, at least one of the operators and functions to describe relationships between word items.
According to an embodiment, the operators include at least one of the following:
an operator +, signifying two or more operands put in series;
an operator |, signifying a parallel relationship between two or more operands;
an operator ( ), indicating operands forming a non-negligible combination;
an operator [ ], indicating operands forming a negligible combination;
an operator ;, indicating the end of a sentence;
an operator : indicating explanation of a word category in the word category description file;
an operator “ ”, indicating a reference to an external dictionary.
According to an embodiment, the functions in the grammar description file include at one of the following:
a function &repeat (EXP, min, max), signifying repeating a grammar portion EXP at least min times, and at most max times;
a function &repeat (EXP, n), signifying repeating EXP n times;
a function &perm (EXP1, EXP2, . . . ), signifying making a full permutation of EXP1, EXP2, . . . ;
a function &grammar (grammar_name), indicating a grammar description file with the name of “grammar_name”;
a function &magic (EXP, key, default, display) or a function &magic (EXP, key, default), mapping EXP to the semantic tag “the “key””, wherein for the function &magic (EXP, key, default) in a grammar matching process, when EXP is successfully matched with a section T of a text, the “key” has a value of T, otherwise, the “key” has the value of “default”, and for the function &magic (EXP, key, default, display) in the grammar matching process, when EXP is successfully matched with a section T of text, the value of the “key” is the value of “display”, otherwise the “key” has a value of “default”.
According to an embodiment, the functions in a word category description file include at least one of the following:
a function &repeat (EXP, min, max), signifying repeating a grammar portion EXP at least min times, and at most max times;
a function &repeat (EXP, n), signifying repeating EXP n times.
a function &perm (EXP1, EXP2, . . . ), signifying making a full permutation of EXP1, EXP2, . . . .
According to an embodiment, on the grammar tree of the semantic meaning, leaf nodes are word categories or referenced external dictionary, non-leaf nodes are operators or function names, the operands of non-leaf nodes are the contents of respective sub-trees of the non-leaf nodes.
According to an embodiment, the function &magic (EXP, key, default, display) and the function &magic (EXP, key, default) are marked as non-leaf nodes, and there are corresponding mapping tables for the function &magic (EXP, key, default, display) and the function &magic (EXP, key, default) in the file storage unit.
In an embodiment, a semantic parsing device comprises:
a whole-sentence matching unit, used to parse a text to be parsed according to a defined sequence (which may be pre-defined) onto a grammar tree obtained with the aforementioned grammar compiling devices, wherein, if it is determined that the text to be parsed completely matches the grammar tree, the matching result is sent to a result determination unit;
a result determination unit, wherein, when receiving the matching result, it determines that the corresponding semantic meaning of the grammar tree is the semantic meaning of the text to be parsed.
According to an embodiment, in the case where the grammar tree is obtained with the aforementioned grammar compiling device, during the matching process, if a section of the text to be parsed is matched with a sub-tree marked by the function &magic (EXP, key, default, display) or the function &magic (EXP, key, default), in the parsing result, the value of the “key” in the corresponding mapping table of the function &magic (EXP, key, default, display) or the function &magic (EXP, key, default) is the section of the text or the value of the “display” in the mapping table.
In an embodiment, a semantic parsing device comprises:
a semantic mapping matching unit, used to carry out a forward maximum matching on the text to be parsed according to a defined sequence (which may be pre-defined) onto a grammar tree obtained with the aforementioned grammar compiling device, wherein, if the text to be parsed has a section that matches a sub-tree marked by a function &magic (EXP, key, default, display) or a function &sub-tree magic (EXP, key, default) in the grammar tree, the matching result is sent to a result determination unit;
a result determining unit, wherein, when receiving the matching result, it determines that the corresponding semantic meaning of the grammar tree is the semantic meaning of the text to be parsed.
According to an embodiment, wherein in the parsing result obtained by the result determination unit, the value of the “key” in the corresponding mapping table of the function &magic (EXP, key, default, display) or the function &magic (EXP, key, default) is the section of the text or the value of the “display” in the mapping table.
According to embodiment, the semantic mapping matching unit sends the matching result to the result determination unit only if there exists a section of the text to be parsed successfully matched with a sub-tree marked by a critical function &magic (EXP, key, default, display) or a critical function &magic (EXP, key, default) defined (or pre-defined) on the grammar tree.
In an embodiment, a semantic parsing device comprises:
a whole-sentence matching unit, used to parse a text to be parsed according to the defined sequence (which may be pre-defined) onto a grammar tree obtained by the aforementioned grammar compiling device, wherein if it is determined that the text to be parsed completely matches the grammar tree, the matching result is sent to a result determination unit, otherwise a semantic mapping matching unit is triggered;
a semantic mapping matching unit, wherein, after triggered, the semantic mapping matching unit carries out a forward maximum matching on the text to be parsed according to the defined sequence onto the grammar tree, if the text to be parsed has a section that matches a sub-tree marked by a function &magic (EXP, key, default, display) or a function &sub-tree magic (EXP, key, default) in the grammar tree, the matching result is sent to a result determination unit;
and a result determining unit, wherein, when receiving the matching result, it determines that the corresponding semantic meaning of the grammar tree is the semantic meaning of the text to be parsed.
According to an embodiment, if the result determination unit receives result from the semantic mapping matching unit, the value of the “key” in the corresponding mapping table of the matched function &magic (EXP, key, default, display) or function &magic (EXP, key, default) is the section of the text or the value of the “display” in the mapping table.
In an embodiment, the grammar compiling methods and semantic parsing method provided are based on grammar description files and word category description files by the LGML, are more similar to human spoken language, have lower parsing level than BNF and XML, and have improved improve readability and maintainability.
A grammar compiling method of an embodiment is based on a new grammar description language, namely, a logical grammar by manifest language (LGML), which differs from the complex structure of the existing BNF or XML and is closer to the sequence and description style of the natural oral language. For common sentence expressions of respective semantic meanings, the LGML is used to define corresponding grammars (which may be pre-defined); a grammar is composed of two parts: a grammar description file and a word description file.
In a grammar description file, a sentence description is composed by operators, word categories, and functions. The grammar description file typically uses an exhaustive list to define a variety of sentence descriptions. Word categories and functions in the grammar description file are usually used as operands for operators. Besides being used as separate operating objects, word categories can be used in the grammar description file as parameters of functions. A word category is a unified marking for a certain number of word items (words or phrases) with the same meaning. Specific word items belonging to a word category are defined in the corresponding word category description file of the word category; in other words, a word category description file is used to describe specific values of word categories. Operators and/or functions can also be included in the word category description to describe relationships between word items. A word category description file can be defined manually, or can be the results of machine mining.
Next, the detailed structure of the LGML according to various example embodiments will be described in detail. First, operators in grammar description files and word category description files may include, but not limited to, the following:
The operator + signifies two or more operands put in series; “put in series” means describing a sentence in sequence.
The operator | signifies a parallel relationship between two or more operands; the so-called “parallel relationship” means one of the operands is selected for the same meaning.
The operator ( ) indicates operands forming a non-negligible combination.
The operator [ ] indicates operands forming a negligible combination.
The operator ; indicates the end of a sentence.
The operator : indicates explanation of a word category in the word category description file.
The operator “ ” indicates reference to an external dictionary, words in the dictionary have parallel relationship.
Functions in a word category description file may include, but not limited to, the following:
The function &repeat (EXP, min, max) signifies repeating a grammar portion EXP at least min times, and at most max times.
The function &repeat (EXP, n) signifies repeating EXP n times.
The function &perm (EXP1, EXP2, . . . ) signifies making a full permutation of EXP1, EXP2, . . . . For example: &perm (EXP1, EXP2, EXP3) is equivalent to put 6 items in parallel:
(EXP1+EXP2+EXP3)|(EXP1+EXP3+EXP2)|(EXP2+EXP3+EXP1)|(EXP2+EXP1+EXP3)|(EXP3+EXP1+EXP2)|(EXP3+EXP2+EXP1)
In addition to the aforementioned function &repeat (EXP, min, max), function &repeat (EXP, n), and the function &perm (EXP1, EXP2, . . . ), a grammar description file can also include, but not limited to, the following functions:
The function &grammar (grammar_name) is usually written at the beginning of the grammar file, indicating the name of the grammar for a sentence expression is “grammar_name”, which marks the grammar description file for the sentence expression.
The function &magic (EXP, key, default, display) and the function &magic (EXP, key, default) map EXP to the semantic tag “key”.
For the function &magic (EXP, key, default) in a grammar matching process, when EXP is successfully matched with a section T of a text, the “key” has a value of T. Otherwise, the “key” has the default value of “default”.
For the function &magic (EXP, key, default, display) in a grammar matching process, when EXP is successfully matched with a section T of text, the value of the “key” is the displayed value “display”, otherwise the “key” has a value of “default”.
Here is an example. Assume a sentence for whether query has a defined grammar description file and a word category description file, the grammar description file can be defined as:
intention
+
query
]+
time
], date, today), &magic([
place
], place, LBS)
weather index
, weather, error, weather);
The word category description file can be defined as:
intention
: I +(want to | would like to);
query
: know | query;
time
: today | tomorrow;
place
: &repeat( “place.dic” +[province| city], 1,2);
weather index
: temperature | humidity;
The grammar compiling according to an embodiment is to compile a semantic meaning into a grammar tree based on the LGML. During compiling of the grammar tree, a reduction method of determined sequence (from left to right in the present example embodiments) is carried out on the aforementioned grammar description file and word category description file; the grammar tree is generated according to the relationships under the functions and operators. Specifically, the grammar description file is generated into a grammar tree with the reduction method from left to right, and the word category description file is generated into word category trees with the reduction method from left to right. And then the word category trees are grafted at the positions of corresponding word categories in the grammar tree; in the end, the word category trees are grafted to leaf nodes of the grammar tree.
During generating of the grammar description file into a grammar tree, leaf nodes are word categories, non-leaf nodes are operators, the operands of an operator as a non-leaf node are the contents of respective sub-trees of the non-leaf node. The function &repeat (EXP, min, max), &repeat (EXP, n), &perm (EXP1, EXP2, . . . ) can all be expressed in the form of the combination of the grammar section EXP and operators. While the function &magic (EXP, key, default, display) and &magic (EXP, key, default) are implemented for mapping relationships; the function of &magic is marked in the following way: it appears on the grammar tree as a non-leaf node, in the meanwhile, there is a corresponding mapping table for the function &magic, for example, there may be a location pointer between the marking of the function &magic and the corresponding mapping table.
Below are some simple examples. Assume the grammar description file of a semantic meaning has the following content:
[A
+
B
]|
C
A reduction is made from left to right. The resulting grammar tree of the grammar description file is shown in A
,
B
, and
C
, the non-leaf nodes are operators. The operands for the non-leaf nodes “+, [ ]” are
A
and
B
. The operands for the non-leaf node “I” are the contents of its respective sub-trees, the sub-tree content of the left branch is [
A
+
B
], the sub-tree content of the right branch is
C
.
If the word category description file of the semantic meaning is as follows:
a+b; A
:
[c+d]+e; B
:
f|g; C
:
A reduction is carried out on the word categories from left to right, the word category trees for the word categories A
,
B
and
C
are respectively shown in
Assume the grammar description file for a certain semantic meaning has the following content:
&magic((X
|
Y
),key,default,display)
A reduction is made from left to right. The resulting grammar tree of the grammar description file is shown in
Assume the word category description file of the semantic meaning is as follows:
[a+b]+c; X
:
[a+b]+d; Y
:
A reduction is carried out on the word categories from left to right, the word category trees for the word categories X
and
Y
are respectively shown in
In other words, in the eventually formed grammar tree for the semantic meaning, the leaf nodes are word items in the word category description file or referenced external dictionary, the non-leaf nodes are operators or function names, operands of the non-leaf nodes are contents represented by the sub-trees of the non-leaf nodes.
If there are two or more functions &magic in the grammar description file, one can merge all the mapping tables of the functions &magic in the grammar description file into one mapping table to facilitate storage and query.
After completion of the aforementioned grammar compiling, one can make semantic paring based on the grammar tree formed through grammar compiling. The text to be parsed, for example, can be the text result from recognition of a user's speech, which can be semantically parsed to obtain its semantic meaning. It can also be the text input into a search engine by a user, which can be semantically parsed to obtain its semantic meaning. The aforementioned are only non-exhaustive examples.
During semantic parsing, the text to be parsed is matched with respective grammar trees. The matching is done from left to right, the corresponding semantic meaning of the matched grammar tree is determined to be the semantic meaning of the text to be parsed. In the matching process, a way of whole-sentence matching can be used, or a way of semantic mapping matching can be used; or the combination of whole-sentence matching and semantic mapping matching can be used, e.g., whole-sentence matching is carried out first, if it gives no result, a semantic mapping matching is carried out. The so-called whole-sentence matching is the case where the text to be parsed can be completely matched with the grammar tree of LGML of a certain semantic meaning. This matching is a versatile and accurate way of matching, regardless of what functions are used in the LGML. In a semantic mapping matching, a section of the text to be parsed can be matched with a section of the grammar defined by the functions &magic, e.g., if the whole or part of the text to be parsed can be matched with a sub-tree marked by the functions &magic on the grammar tree, it is then determined that the semantic meaning of the text includes the semantic meaning mapped from the function &magic.
During whole-sentence matching, the way the text to be parsed is matched with the grammar tree is substantially the same with existing grammar tree matching ways, the difference is that the present matching is carried out from left to right, and a successful matching is found only in the case where the text to be parsed can be completely matched onto the grammar tree. In particular, during matching, if matching with a sub-tree marked by a function &magic is found, based on the matching result of the sub-tree marked by the function &magic, the mapping result obtained from the mapping table indicated by the function &magic is taken as the parsing result. That is, if a certain section of the text to be parsed is completely matched with a sub-tree marked by the function &magic, for the parsing results, the value of the “key” in the mapping table indicated by the function &magic is the section of the text or the value of “display” in the mapping table.
The grammar tree shown in
For the special function of &magic, a semantic mapping matching can be used on its marked sub-tree, e.g., a forward maximum matching is carried out on the sub-tree marked by the function &magic and the text to be parsed. If the text to be parsed has a section that matches the sub-tree marked by the function &magic, it then can be determined that the semantic meaning of the text to be parsed is that of the grammar tree; in the parsing result, the value of the “key” in the mapping table indicated by the function &magic is the section of the text or the value of “display” in the mapping table.
Similarly, the grammar tree shown in
In view of the accuracy of the whole-sentence matching and the high coverage of the semantic mapping matching, to integrate the advantages of the two ways of matching, embodiments may employ the whole-sentence matching followed by the semantic mapping matching, e.g., if the whole-sentence matching fails, a semantic mapping matching is carried out.
Below is a specific example. Assume the grammar tree for the semantic meaning of a query about weather is shown in
If the text to be parsed is “I want to know Beijing temperature”, whole-sentence matching can be successfully carried out, magic1 is mapped to “today”, magic2 is mapped to “Beijing”, magic3 is mapped to “weather”, the semantic meaning of the text to be parsed is therefore to query the weather. In the parsing results, the date has a value of “today”, the place has a value of “Beijing”, and the weather has the value of “weather”, although the user did not say “today”, according to the grammar, the operator [ ] of magic1 is negligible, its value is therefore the default value “today”.
If the text to be parsed is “Tell me temperature”, as the grammar does not define relevant grammar section for “Tell me”, the whole-sentence matching fails, then a semantic mapping matching is carried out, the text section “temperature” is successfully matched with the sub-tree of magic3, the semantic mapping matching is successful, one can still deem the text to be parsed has a semantic meaning of “weather query”, magic1 is mapped to “today”, magic2 is mapped to “LBS”, magic3 is mapped to “weather”. Accordingly, in the parsing result, the date of the parsing result has a value of “today”, the place has a value of “LBS”, and weather has a value of “weather”.
Also, during semantic mapping matching, a critical magic function can be set, so that the semantic mapping matching is deemed successful only if the sub-tree of the critical magic function is successfully matched. For example, the function magic3 on the grammar tree shown in
It is possible to have the following situation: during semantic mapping matching onto a grammar tree of a semantic meaning, the sub-trees of a plurality of &magic functions are able to be matched with the text to be parsed. To resolve the conflict in this case, one can use defined priority levels of the sub-trees, or the maximum constraint for the matched word items.
The above is a description of an embodiment of a method. An embodiment of a device is described below.
The file storage unit 41 stores the grammar description file and the corresponding word category description file for the semantic meaning. The grammar description file and the corresponding word category description file are defined based on the LGML according to a common sentence expression of the semantic meaning. In the grammar description file, the description of a common sentence is composed of word categories, operators, and functions. The word category file is used to describe specific values for word categories.
The word category description file comprises word items, or comprises, besides word items, at least one of operators or functions to describe relationships between word items.
The operators in the grammar description file and the word category description file may include, but not limited to, the following:
The operator + signifies two or more operands put in series.
The operator | signifies a parallel relationship between two or more operands.
The operator ( ) indicates operands forming a non-negligible combination.
The operator [ ] indicates operands forming a negligible combination.
The operator ; indicates the end of a sentence.
The operator : indicates explanation of a word category in the word category description file.
The operator “ ” indicates reference to an external dictionary.
Functions in a word category description file may include, but not limited to, the following:
The function &repeat (EXP, min, max) signifies repeating a grammar portion EXP at least min times, and at most max times.
The function &repeat (EXP, n) signifies repeating EXP n times.
The function &perm (EXP1, EXP2, . . . ) signifies making a full permutation of EXP1, EXP2, . . . . For example: &perm (EXP, EXP2, EXP3) is equivalent to put 6 items in parallel:
(EXP1+EXP2+EXP3)|(EXP1+EXP3+EXP2)|(EXP2+EXP3+EXP1)|(EXP2+EXP1+EXP3)|(EXP3+EXP1+EXP2)|(EXP3+EXP2+EXP1)
In addition to the aforementioned function &repeat (EXP, min, max), function &repeat (EXP, n), and the function &perm (EXP1, EXP2, . . . ), a grammar description file can also include, but not limited to, the following functions:
The function &grammar (grammar_name) is usually written at the beginning of the grammar file, indicating the name of the grammar for a sentence is “grammar_name”, which marks the grammar description file for the sentence.
The function &magic (EXP, key, default, display) and function &magic (EXP, key, default) maps EXP to the semantic tag the “key”.
For the function &magic (EXP, key, default) in a grammar matching process, when EXP is successfully matched with a section T of a text, the “key” has a value of T. Otherwise, the “key” has the value of default.
For the function &magic (EXP, key, default, display) in a grammar matching process, when EXP is successfully matched with a section T of text, the value of the “key” is “display”, otherwise the value of the “key” is default.
The grammar tree generation unit 42 is responsible for respectively generating the grammar description file and the word category file into a grammar tree for the grammar description file and word category trees for the word category file with the reduction method of a defined sequence (for example, from left to right), and grafting the word category trees to the positions of the corresponding word categories on the grammar tree.
During generating of the grammar description file into a grammar tree, leaf nodes are word categories, non-leaf nodes are operators, the operands of an operator as a non-leaf node are the contents of respective sub-trees of the non-leaf node. The function &repeat (EXP, min, max), &repeat (EXP, n), &perm (EXP1, EXP2, . . . ) can all be expressed in the form of the combination of the grammar section EXP and operators. While the function &magic (EXP, key, default, display) and &magic (EXP, key, default) are implemented for mapping relationships, the function of &magic is marked in the following way: it appears on the grammar tree as a non-leaf node, in the meanwhile, there is a corresponding semantic mapping for the function &magic, for example, there may be a location pointer between the marking of the function &magic and the corresponding mapping table.
In the eventually formed grammar tree for the semantic meaning, the leaf nodes are word items in the word category description file or referenced external dictionary, the non-leaf nodes are operators or function names, operands of non-leaf nodes are contents represented by the sub-trees of the non-leaf nodes.
If there are two or more functions &magic in the grammar description file, one can merge all the mapping tables of the functions &magic in the grammar description file into one mapping table to facilitate the storage and query.
Based on the grammar tree of the semantic meaning obtained with the grammar compiling device shown in
The whole-sentence matching unit 51 matches the text to be parsed on the above grammar tree of the semantic meaning according to a defined sequence, if it is determined that the text to be parsed completely matches the grammar tree, the matching result is sent to the result determination unit 52.
When the result determination unit 52 receives the matching result, it is determined that the corresponding semantic meaning of the grammar tree is the semantic meaning of the text to be parsed.
As can be seen, the whole-sentence matching is the case where the text to be parsed can be completely matched with the grammar tree of LGML of a certain semantic meaning. This matching is a versatile and accurate way of matching, regardless of what functions are used in the LGML.
In particular, during whole-sentence matching, the sub-trees marked by &magic function may be matched, if a section of the text to be parsed is matched with a sub-tree marked by the function &magic (EXP, key, default, display) or the function &magic (EXP, key, default), in the parsing result obtained by the result determination unit 52, the value of the “key” in the corresponding mapping table of the function &magic (EXP, key, default, display) or the function &magic (EXP, key, default) is the section of the text or the value of the “display” in the mapping table.
The semantic mapping matching unit 61 carries out a forward maximum matching on the text to be parsed according to a defined sequence onto the grammar tree of a semantic meaning. If the text to be parsed has a section that matches a sub-tree marked by the function &magic (EXP, key, default, display) or the function &sub-tree magic (EXP, key, default), the matching result is sent to the result determination unit 62.
When the result determination unit 62 receives the matching results, it is determined that the corresponding semantic meaning of the grammar tree is the semantic meaning of the text to be parsed. Further, in the parsing result obtained by the result determination unit 62, the value of the “key” in the corresponding mapping table of the function &magic (EXP, key, default, display) or the function &magic (EXP, key, default) is the section of the text or the value of the “display” in the mapping table.
Also, during semantic mapping matching, a critical magic function can be set, so that the semantic mapping matching is deemed successful only if the sub-tree of the critical magic function is successfully matched. That is, the semantic mapping matching unit 61 sends the matching result to the result determination unit 62 only if there exists a section of the text to be parsed successfully matched with a sub-tree marked by the critical function &magic (EXP, key, default, display) or the critical function &magic (EXP, key, default) defined on the grammar tree.
The whole-sentence matching unit 71 matches the text to be parsed according to a defined sequence onto the grammar tree, if it is determined that the text to be parsed completely matches the grammar tree, the matching result is sent to the result determination unit 72, otherwise the semantic mapping matching unit 72 is triggered.
After being triggered, the semantic mapping matching unit 72 carries out a forward maximum matching on the text to be parsed according to a defined sequence onto the grammar tree. If the text to be parsed has a section that matches a sub-tree marked by the function &magic (EXP, key, default, display) or the function &sub-tree magic (EXP, key, default), the matching result is sent to the result determination unit 73.
Regardless of what the result determination unit 73 receives, either the result from the whole-sentence matching unit 71 or that from the semantic mapping matching unit 72, the unit 73 can determine that the corresponding semantic meaning of the grammar tree is the semantic meaning of the text to be parsed.
Further, if the result determination unit 73 receives result from the semantic mapping matching unit 72, the value of the “key” in the corresponding mapping table of the matched function &magic (EXP, key, default, display) or the function &magic (EXP, key, default) is the section of the text or the value of the “display” in the mapping table.
Similarly, during semantic mapping matching, a critical magic function can be set, so that the semantic mapping matching is deemed successful only if the sub-tree of the critical magic function is successfully matched. That is, the semantic mapping matching unit 72 sends the matching result to the result determination unit 73 only if there exists a section of the text to be parsed successfully matched with the sub-trees marked by the critical function &magic (EXP, key, default, display) or critical function &magic (EXP, key, default) defined on the grammar tree.
The following should be particularly noticed: during semantic parsing, usually the grammar trees of the respective semantic meanings are matched one by one to determine the corresponding semantic meaning of the text to be parsed. The description of example embodiments is based on examples of matching on one grammar tree, the matching processes for different grammar trees are similar. In addition, in some applications, multi-level semantic parsing is used; in such a case, during semantic parsing in each level, the semantic parsing method and semantic parsing device according to an embodiment may be used.
It should be understood that the devices and methods disclosed can be implemented through other ways. For example, the embodiments for the devices are exemplary, e.g., the division of the units is merely logical one, in reality, they can be divided in other ways.
The units described as separate parts may be or may not be physically separated, the parts shown as units may be or may not be physical units, e.g., they can be located in one place, or distributed in a plurality of network units. One can select some or all the units to achieve the purpose of the embodiment according to the actual needs.
Further, in the examples, the functional units can be integrated in one processing unit, or they can be separate physical presences; or two or more units can be integrated in one unit. The integrated unit described above can be realized as hardware, or they can be realized with hardware and software functional unit.
The aforementioned integrated unit in the form of software functional units may be stored in a computer readable storage medium. The aforementioned software function unit is stored in a storage medium, including several instructions to instruct a computer device (a personal computer, server, or network equipment, etc.) or processor to perform some steps of the method described in the various embodiments. The aforementioned storage medium may include: U disk, removable hard disk, read-only memory (ROM), a random access memory (RAM), magnetic disk, or an optical disk medium storing program code.
Number | Date | Country | Kind |
---|---|---|---|
201310203987.2 | May 2013 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2014/078596 | 5/28/2014 | WO | 00 |