The disclosed technology relates to a reading disambiguation apparatus, a reading disambiguation method, and a reading disambiguation program.
In a speech synthesis system required for reading-out and the like, correctly estimating a reading of a word is one of important factors for improving accuracy of the system. Disambiguation of a reading of a word means a task to estimate a correct reading in an input sentence for a word with different readings in the same notation, such as “” in “ (kata) (I've received things from many people)” and “ (hou) (I am from the west)”.
As a related-art study on disambiguation of a reading of a word, there is proposed a disambiguation technique characterized by a morpheme notation and an n-gram of a part-of-speech (Ryuichi Yoneda, “Disambiguation of Reading Output by Morphemic Analyzer”, Nara Institute of Science and Technology, Master Thesis, NAIST-IS-MT0151124, 2003).
Further, as a relevant study, a technique for estimating a reading is also proposed and use of an n-gram of characters is featured (Tetsuro Sasada, Shinsuke Mori, and Tatsuya Kawahara, “Improved estimation accuracy of reading with vocabulary acquisition from speech and text”, NLP2008, p. 420-p. 243, 2008).
As disambiguation of a reading, there are a case (1) and a case (2). The case (1) is a case where words appearing around a word of interest offer a clue. The case (2) is a case where a topic discussed in a sentence in which a word of interest appears (e.g., “baseball”, “Japanese chess”, and the like) offers a clue. For the case (1), a reading of a word of interest can be captured using an n-gram in the related art. However, in morpheme notation and literation used in the related-art technique, for example, “ (tsuno) (antler of a deer)” and “ (tsuno) (horn of a buffalo)” are different n-grams. Thus, in learning data, even if there is “”, if there is no “”, it is impossible to correctly estimate “ (tsuno)” for the latter, which leads to a problem in that the morphemic notation and the literation cannot cover variations.
In addition, for the case (2), it is theoretically possible to cover variations by setting n to a large value, but there is a problem in that variations cannot be captured in 3-gram and 5-gram used in practical use. For example, in a sentence of “1 7 /4 0 / (“kyojin” is the name of an organization, and the accent is on “kyo”1/” (Tani of Kyojin, who is in his 17th year as a professional baseball player and 40 years old, plays in the game as the first batter for the first time in this season) (“/” is a boundary of morphemes), if only three to five morphemes before and after “” are seen, it is difficult to distinguish between a general word “” and an organization “”.
The disclosed technology has been made in view of the aforementioned circumstances, and an object thereof is to provide a reading disambiguation apparatus, a reading disambiguation method, and reading disambiguation program that are capable of accurately estimating a reading of each morpheme in a morpheme array.
A first aspect of the present disclosure is a reading disambiguation apparatus, including: an input unit configured to receive a morpheme array and parts-of-speech of morphemes of the morpheme array; an ambiguous word candidate acquisition unit configured to acquire, for each morpheme of the morpheme array, based on a notation and a part-of-speech of the morpheme, reading candidates of the morpheme from reading candidates of the morpheme defined in advance for each combination of a notation and a part-of-speech of the morpheme; and a disambiguation unit configured to determine a reading of the morpheme from the acquired reading candidates of the morpheme using a disambiguation rule in which a reading of the morpheme is defined in advance correspondingly to appearance positions of other morphemes and notations, parts-of-speech, or character types of the other morphemes.
A second aspect of the present disclosure is a reading disambiguation method, including: receiving, by an input unit, a morpheme array and parts-of-speech of morphemes of the morpheme array; acquiring, by an ambiguous word candidate acquisition unit, for each morpheme of the morpheme array, based on a notation and a part-of-speech of the morpheme, reading candidates of the morpheme from reading candidates of the morpheme defined in advance for each combination of a notation and a part-of-speech of the morpheme; and determining, by a disambiguation unit, a reading of the morpheme from the acquired reading candidates of the morpheme using a disambiguation rule in which a reading of the morpheme is defined in advance correspondingly to appearance positions of other morphemes and notations, parts-of-speech, or character types of the other morphemes.
A third aspect of the present disclosure is a reading disambiguation program for causing a computer to execute: receiving a morpheme array and parts-of-speech of morphemes of the morpheme array; acquiring, for each morpheme of the morpheme array, based on a notation and a part-of-speech of the morpheme, reading candidates of the morpheme from reading candidates of the morpheme defined in advance for each combination of a notation and a part-of-speech of the morpheme; and determining a reading of the morpheme from the acquired reading candidates of the morpheme using a disambiguation rule in which a reading of the morpheme is defined in advance correspondingly to appearance positions of other morphemes and notations, parts-of-speech, or character types of the other morphemes.
According to the disclosed technology, it is possible to accurately estimate a reading of each morpheme of a morpheme array.
Hereinafter, an example of embodiments of the disclosed technology will be described below with reference to the drawings. Note that same reference numerals are assigned to the same or equivalent components and portions in the drawings. In addition, a dimensional proportion in the drawings is exaggerated for convenience of description and may be different from an actual proportion.
As illustrated in
The CPU 11 is a central processing unit that executes various programs and controls each unit. That is, the CPU 11 reads out a program from the ROM 12 or the storage 14 and executes the program using the RAM 13 as a workspace. The CPU 11 performs control and various pieces of processing for each of the above components in accordance with the program stored in the ROM 12 or the storage 14. In the present embodiment, the ROM 12 or the storage 14 stores a reading disambiguation program for disambiguating a reading of an input sentence.
The ROM 12 stores various programs and various data. The RAM 13 temporarily stores programs or data as a workspace. The storage 14 is constituted by a hard disk drive (HDD) or a solid state drive (SSD) and stores various programs including an operating system and various data.
The input unit 15 includes a pointing device such as a mouse and a keyboard, and is used to perform various inputs.
The input in the present embodiment is a morphemic analysis result obtained by analyzing a “sentence” or a “sentence set” that is a morpheme array using a related-art morphemic analyzer, as illustrated in
The example in
The display unit 16 is, for example, a liquid crystal display, and displays various pieces of information. The display unit 16 may function as the input unit 15 using a touch panel method.
The communication interface 17 is an interface for communication with other devices, and standards such as the Ethernet (trade name), the FDDI, or the Wi-Fi (trade name) are used.
Next, a functional configuration of the reading disambiguation apparatus 10 will be described.
As illustrated in
The category dictionary 20 is a dictionary containing category information for a notation of each morpheme and, for example, a “Japanese vocabulary system” can be used.
The category information imparting unit 22 uses the category dictionary 20 to impart, to each morpheme of a morpheme array, category information of a word corresponding to the morpheme. Specifically, the category information imparting unit 22 references the category dictionary 20 to output a morphemic analysis result with category information in which category information corresponding to a notation of each morpheme in the input morphemic analysis result is imparted (see
The reading candidate list 24 stores a reading (pronunciation notation) for each of combinations of a notation and a main part-of-speech of each morpheme, as illustrated in
Note that, as illustrated in
The ambiguous word candidate acquisition unit 26 references the reading candidate list 24 for each morpheme of the input morphemic analysis result, based on a notation and a part-of-speech of the morpheme, to acquire reading candidates of the morpheme.
For example, for each morpheme of the morphemic analysis result, the ambiguous word candidate acquisition unit 26 cuts out only a main part-of-speech from parts-of-speech of the morpheme, searches the reading candidate list 24 for a pair of the “notation” and the “main part-of-speech” of the morpheme, and if there is a fitting pair in the reading candidate list 24, acquires a reading (pronunciation notation) corresponding to the pair as a reading candidate. In the examples of
For example, in the example of
In the example of
The disambiguation rule list 28 stores a disambiguation rule in which for each notation of a morpheme, a reading and a score of the morpheme are defined in advance, correspondingly to appearance positions of other morphemes and notations, parts-of-speech, or categories of the other morphemes.
Examples of the disambiguation rule are shown in
As illustrated in
As illustrated in
The “condition content” is a specific value in the type specified in the “condition type”, and when category information is specified in the “condition type”, a category number is specified. When a character type is specified in the “condition type”, a regular expression corresponding to a character type such as a Chinese character, a Japanese hiragana character, a Japanese katakana character, a numeric character, or an alphabetical character is specified in the “condition content”. For example, in a case where the “notation” of the disambiguation rule is “”, the “reading (pronunciation notation)” is “o”, and “rule part” is “+1: REXP_C: \p{InHiragana}”, if a rule that “a character type of a morpheme notation immediately after the target includes a Japanese hiragana character” is met, it is specified that the “reading (pronunciation notation)” of “” is determined to be “o”. For example, it is possible to determine the “reading (pronunciation notation)” of “” of “□” to be “o”.
For each morpheme of the input morphemic analysis result, when the morphemic analysis result for a reading candidate of the morpheme meets a disambiguation rule for the reading candidate, the disambiguation rule being obtained from the disambiguation rule list 28, the disambiguation unit 30 adds a score of the disambiguation rule as a score of the reading candidate. The disambiguation unit 30 determines a reading candidate having the highest score as the reading of the morpheme.
Specifically, the disambiguation unit 30 uses, as a disambiguation target, each morpheme for which there is a reading candidate, compares the morphemic analysis result with category information with the “rule part” of the disambiguation rule for the reading candidate, and if any disambiguation rule is met, a score of the disambiguation rule is added as the score of the reading candidate.
The comparison of the disambiguation rule is performed by checking whether the “condition type” of a morpheme in the “application range” of each condition corresponds to the “condition content.” When a plurality of conditions are present, checking is performed for the individual conditions, and if there is even one condition that a morpheme does not meet, it is determined that the morpheme does not meet the disambiguation rule.
In the case of the example in
In the case of the example in
Furthermore, in the case where the “condition type” is the “character type,” for a notation of a morpheme that is a disambiguation target, the comparison of the disambiguation rule is performed by determining whether a regular expression representing the character type specified in the “condition content” is satisfied.
After all of the met disambiguation rules are applied, among readings (pronunciation notations) that are reading candidates, a reading candidate having the highest score is determined to be a reading (pronunciation notation) after disambiguation, and the “reading (pronunciation notation)” field in the input morphemic analysis result is rewritten to the reading (pronunciation notation) after disambiguation. When disambiguation is not achieved, rewriting is not performed. Note that a threshold of the score may be provided and when a score of a reading candidate exceeds the threshold, disambiguation may be determined to be achieved, so that rewriting to the reading candidate may be performed.
For example, in the example of the morphemic analysis result in
In addition, in the example of
In addition to the “reading (pronunciation notation)” field, the reading candidate list may be made to include parts-of-speech after disambiguation (see
Furthermore, when disambiguation is not achieved by the method using rules described above, or when reading candidates are rejected by the threshold, a “default flag” may be prepared in the reading candidate list to make modification to information of a reading candidate to which the flag is assigned.
Next, an action of the reading disambiguation apparatus 10 will be described.
In step S100, for each morpheme of the morphemic analysis result input by the input unit 15, the CPU 11 uses the category dictionary 20 as the category information imparting unit 22 to impart the category information of a word corresponding to the morpheme.
In step S102, for each morpheme of the input morphemic analysis result, the CPU 11 references the reading candidate list 24 based on a notation and a part-of-speech of the morpheme to acquire reading candidates of the morpheme, as the ambiguous word candidate acquisition unit 26.
In step S104, for each morpheme of the input morphemic analysis result, when the morphemic analysis result for a reading candidate of the morpheme meets the disambiguation rule for the reading candidate obtained from the disambiguation rule list 28, the CPU 11 adds a score of the disambiguation rule as a score of the reading candidate, as the disambiguation unit 30. Then, for each morpheme of the input morphemic analysis result, the CPU 11 determines a reading candidate having the highest score as the reading of the morpheme.
As described above, the reading disambiguation apparatus 10 of the embodiment of the technology of the present disclosure determines a reading of a morpheme from reading candidates of the morpheme using a disambiguation rule in which a reading of the morpheme is defined in advance correspondingly to appearance positions of other morphemes and notations, parts-of-speech, or categories of the other morphemes. This allows for accurate estimation of a reading of each morpheme of a morpheme array included in the morphemic analysis result. In particular, disambiguation of a reading of a word that is an input of speech synthesis can be achieved.
Note that, in each of the above-described embodiments, various processors other than the CPU may execute language processing which the CPU executes by reading software (program). Examples of the processors include a programmable logic device (PLD) such as a field-programmable gate array (FPGA), in which a circuit configuration can be changed after manufacturing, a dedicated electric circuit that is a processor having a circuit configuration specially designed to execute specific processing, such as an application specific integrated circuit (ASIC), and the like. Furthermore, the reading disambiguation processing may be performed in one of these various processors, or in a combination of two or more processors of the same or different types (e.g., a plurality of FPGAs, a combination of a CPU and an FPGA, or the like). Furthermore, the hardware structure of these various processors is, more specifically, an electrical circuit that combines circuit elements such as semiconductor devices.
In the embodiments described above, an aspect has been described in which the reading disambiguation program is stored in advance (installed) in the storage 14, but the present disclosure is not limited thereto. The program may be provided in a form in which the program is included in a non-transitory storage medium such as a compact disk read only memory (CD-ROM), a digital versatile disk read only memory (DVD-ROM), or a universal serial bus (USB) memory. The program may also be in a form in which the program is downloaded from an external device via a network.
In addition, although the case in which the category dictionary 20, the reading candidate list 24, and the disambiguation rule list 28 are present in the reading disambiguation apparatus 10 has been exemplified, the present disclosure is not limited thereto. At least one of the category dictionary 20, the reading candidate list 24, and the disambiguation rule list 28 may be present outside the reading disambiguation apparatus 10.
Furthermore, although the case where the technology of the present disclosure is applied to the reading disambiguation apparatus 10 that rewrites a reading included in the morphemic analysis result, the present disclosure is not limited thereto. For example, the technology of the present disclosure may be applied to an apparatus that uses a morpheme array and a part-of-speech of each morpheme of the morpheme array as inputs to estimate a reading of the morpheme.
With respect to the above embodiments, the following supplements are further disclosed.
Supplement 1
A reading disambiguation apparatus, including:
a memory; and
at least one processor connected to the memory, in which
the processor is configured to:
receive a morpheme array and parts-of-speech of morphemes of the morpheme array; acquire, for each morpheme of the morpheme array, based on a notation and a part-of-speech of the morpheme, reading candidates of the morpheme from reading candidates of the morpheme defined in advance for each combination of a notation and a part-of-speech of the morpheme; and
determine a reading of the morpheme from the acquired reading candidates of the morpheme using a disambiguation rule in which a reading of the morpheme is defined in advance correspondingly to appearance positions of other morphemes and notations, parts-of-speech, or character types of the other morphemes.
Supplement 2
A non-transitory storage medium, storing a reading disambiguation program causing a computer to execute:
receiving a morpheme array and parts-of-speech of morphemes of the morpheme array; acquiring, for each morpheme of the morpheme array, based on a notation and a part-of-speech of the morpheme, reading candidates of the morpheme from reading candidates of the morpheme defined in advance for each combination of a notation and a part-of-speech of the morpheme; and
determining a reading of the morpheme from the acquired reading candidates of the morpheme by using a disambiguation rule in which a reading of the morpheme is defined in advance correspondingly to appearance positions of other morphemes and notations, parts-of-speech, or character types of the other morphemes.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/018451 | 5/8/2019 | WO |