Tibetan Character Constituent Analysis Method, Tibetan Sorting Method And Corresponding Devices

Information

  • Patent Application
  • 20180011836
  • Publication Number
    20180011836
  • Date Filed
    October 31, 2016
    7 years ago
  • Date Published
    January 11, 2018
    6 years ago
Abstract
The present invention discloses a Tibetan character constituent analysis method, a Tibetan sorting method and corresponding devices, and relates to the field of natural language processing. The present invention is proposed to solve the problem that the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of automatic computer Tibetan sorting. The technical solution provided by the present invention includes: S10, acquiring a Tibetan text to be analyzed; S20, using Tibetan characters in the Tibetan text as the input of a preset finite state automaton group; and S30, acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the Tibetan characters in the Tibetan text are correctly spelled.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit and priority of Chinese Patent Application No. 201610528753.9 filed Jul. 5, 2016. The entire disclosure of the above application is incorporated herein by reference.


FIELD

The present invention relates to the field of natural language processing, in particular to a Tibetan character constituent analysis method, a Tibetan sorting method and corresponding devices.


BACKGROUND

Like other languages, automatic computer Tibetan sorting method is also widely used in various fields of Tibetan information technology, including Tibetan dictionary and thesaurus sorting, information retrieval, text sorting and the like. Since the research on the Tibetan information technology in the early 1980s, the research on the automatic computer Tibetan sorting has never been stopped. With the development of the Tibetan information technology, an automatic Tibetan sorting algorithm is generally adopted in the prior art to sort the Tibetan.


However, as the existing sorting algorithms and models are not perfect and are error-prone and too complicated, the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of the automatic computer Tibetan sorting.


SUMMARY

The present invention provides a Tibetan character constituent analysis method, a Tibetan sorting method and corresponding devices, which have universality and compatibility, and can facilitate the use of automatic computer Tibetan sorting.


On one aspect, a Tibetan character constituent analysis method is provided, including: S10, acquiring a Tibetan text to be analyzed; S20, using Tibetan characters in the Tibetan text as the input of a preset finite state automaton group; and S30, acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the Tibetan characters in the Tibetan text are correctly spelled; the finite state automaton group includes 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qii of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi, and FiQi; and the custom-character is a positive integer, and custom-character≦24.


On another aspect, a Tibetan sorting method is provided, including: S10, acquiring at least two Tibetan characters to be sorted; S20, respectively using the at least two Tibetan characters to be sorted as the input of a preset finite state automaton group; S30, acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the input Tibetan characters are correctly spelled; and S40, sorting the at least two Tibetan characters according to the constituents of the at least two Tibetan characters to acquire a sorting result; the finite state automaton group includes 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qii of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi and FiQi; and the custom-character is a positive integer, and custom-character≦24.


On a third aspect, a Tibetan sorting method is provided, including: S10, acquiring at least two Tibetan words to be sorted; S20, respectively acquiring Tibetan characters in the at least two Tibetan words; S30, respectively using the Tibetan characters in the at least two Tibetan words as the input of a preset finite state automaton group; S40, acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the input Tibetan characters are correctly spelled; and S50, sorting the at least two Tibetan words according to the constituents of the each Tibetan character in the at least two Tibetan words to acquire a sorting result; the finite state automaton group includes 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qii of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi and FiQi; and the custom-character is a positive integer, and custom-character≦24.


On a fourth aspect, a Tibetan character constituent analysis device is provided, including:


a text acquisition module, used for acquiring a Tibetan text to be analyzed;


a text input module, connected with the text acquisition module and used for using Tibetan characters in the Tibetan text as the input of a preset finite state automaton group; and


a constituent analysis module, connected with the text input module and used for acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the Tibetan characters in the Tibetan text are correctly spelled;


the finite state automaton group includes 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qii of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi and FiQi; and the custom-character is a positive integer, and custom-character≦24.


On a fifth aspect, a Tibetan sorting device is provided, including:


a Tibetan character acquisition module, used for acquiring at least two Tibetan characters to be sorted;


a Tibetan character input module, connected with the Tibetan character acquisition module and used for respectively using the at least two Tibetan characters to be sorted as the input of a preset finite state automaton group;


a constituent analysis module, connected with the Tibetan character input module and used for acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the input Tibetan characters are correctly spelled; and


a sorting module, connected with the constituent analysis module and used for sorting the at least two Tibetan characters according to the constituents of the at least two Tibetan characters to acquire a sorting result;


the finite state automaton group includes 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qii of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton M; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi and FiQi; and the custom-character is a positive integer, and custom-character≦24.


On a sixth aspect, a Tibetan sorting device is provided, including:


a Tibetan word acquisition module, used for acquiring at least two Tibetan words to be sorted;


a Tibetan character acquisition module, connected with the Tibetan word acquisition module and used for respectively acquiring Tibetan characters in the at least two Tibetan words;


a Tibetan character input module, connected with the Tibetan character acquisition module and used for respectively using the Tibetan characters in the at least two Tibetan words as the input of a preset finite state automaton group;


a constituent analysis module, connected with the Tibetan character input module and used for acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the input Tibetan characters are correctly spelled; and


a sorting module, connected with the constituent analysis module and used for sorting the at least two Tibetan words according to the constituents of the each Tibetan character in the at least two Tibetan words to acquire a sorting result;


the finite state automaton group includes 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qii of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi and FiQi; and the custom-character is a positive integer, and custom-character≦24.


The present invention has the following beneficial effects: the Tibetan text to be analyzed is used as the input of the finite state automaton group, and the constituents of the Tibetan characters are acquired according to the target finite state automaton which determines that the Tibetan characters are correct, therefore Tibetan character constituent analysis is achieved, and Tibetan sorting can be further achieved according to the constituents of the Tibetan characters. As the finite state automaton group corresponds to the Tibetan spelling formal grammar, the technical solutions provided by the embodiments of the present invention can solve the problem that the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of automatic computer Tibetan sorting.





DRAWINGS


FIG. 1 is a flowchart of a Tibetan character constituent analysis method provided by a first embodiment of the present invention;



FIG. 2 is a flowchart of a Tibetan sorting method provided by a second embodiment of the present invention;



FIG. 3 is a flowchart of a Tibetan sorting method provided by a third embodiment of the present invention;



FIG. 4 is a schematic diagram of a structure of a Tibetan character constituent analysis device provided by a fourth embodiment of the present invention;



FIG. 5 is a schematic diagram of a structure of a Tibetan sorting device provided by a fifth embodiment of the present invention;



FIG. 6 is a schematic diagram of a structure of a Tibetan sorting device provided by a sixth embodiment of the present invention.





DETAILED DESCRIPTION

The present invention will be further illustrated below in combination with accompanying drawings and embodiments. But the usage and the objective of these exemplary implementations are merely used for citing the present invention, but do not constitute any form of limitation to the actual protection scope of the present invention, let alone limit the protection scope of the present invention hereto.


First Embodiment

As shown in FIG. 1, the embodiment of the present invention provides a Tibetan character constituent analysis method, including the following steps.


Step 101, a Tibetan text to be analyzed is acquired.


In the embodiment, the Tibetan text acquired in the step 101 can only contain one Tibetan character and can also contain a plurality of Tibetan characters, and this is not limited herein. Specifically, when the Tibetan text contains a plurality of Tibetan characters, the acquired Tibetan text can be firstly segmented with an character as a unit to acquire at least one Tibetan character; and the segmentation mode can be that the acquired Tibetan text is segmented with an character as a unit according to a Tibetan character separator, a vertical character, a double-vertical character and a space character.


Particularly, when the Tibetan text contains a plurality of Tibetan characters, it may also be a Tibetan word composed of a plurality of Tibetan characters, at this time, the acquired Tibetan text can be segmented according to a specific separator and other signs, and this is not limited herein.


Step 102, the Tibetan characters in the Tibetan text are used as the input of a preset finite state automaton group.


In the embodiment, when the Tibetan text only contains one Tibetan character, the step 102 specifically includes: using the Tibetan character as the input of the preset finite state automaton group; and when the Tibetan text only contains a plurality of Tibetan characters, the step 102 specifically includes: respectively using the Tibetan characters in the Tibetan text as the input of the preset finite state automaton group.


In the embodiment, the finite state automaton group includes 24 finite state automata, wherein any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qii of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi and FiQi; and the custom-character is a positive integer, and custom-character≦24.


In the embodiment, 24 Tibetan spelling formal grammars are preset, and each Tibetan spelling formal grammar corresponds to one finite state automaton; and at least one Tibetan character is used as the input of each preset finite state automaton in sequence. The finite set of the terminal symbols of the Tibetan spelling formal grammar Gi is a subset of a set L consisting of 30 Tibetan consonants, 5 reverse scripts, 4 vowel symbols and 1 long vowel symbol, and includes characters (symbols) actually occurring in a sentence (a Tibetan character belonging to a certain structure) of the language; the set of the non-terminal symbols of the Tibetan spelling formal grammar Gi includes words that do not actually occur in the sentence of the language, but play the function of variables in deduction, and are equivalent to the grammatical category in the language. For example, the non-terminal symbol can be a variable of an SVO (Subject Verb Object) word order of the Chinese, the SOV (Subject Object Verb) word order of the Tibetan and other grammars, but it does not occur in a specific sentence, that is, it implicitly works, but cannot be seen.


Elements in the finite set of the terminal symbols and the finite set of the non-terminal symbols correspond to specific Tibetan spelling formal grammars. The initial state of the finite state automaton Mi is a state, in which the automation just starts to work, and this state is a state in which the automaton primarily receives input characters; and the termination state refers to a final state of the automaton. Specifically, the automata in the finite state automaton group can be a determined type and can also be an undetermined type; and to facilitate the understanding and improve the implementation efficiency, the automata of the determined types provided by the embodiment are taken as an example for illustration.


In the embodiment, the process of acquiring the finite state automaton group can include: acquiring the Tibetan spelling formal grammar Gi, wherein the Gi=(Ti, Vi, Si, Pi); acquiring a termination state identifier Ei of the finite state automaton group Mi; judging whether a finite set Pi of production rules of the Tibetan spelling formal grammar Gi contains a production rule Sicustom-character; if so, acquiring Fi with values of Si and Ei; if not, acquiring Fi with a value Ei; and acquiring the finite state automaton Mi according to the Ti, Vi, Si and Fi, wherein Ti represents the finite set of the terminal symbols of the Tibetan spelling formal grammar Gi; Si represents a start symbol of the Tibetan spelling formal grammar Gi; SiεVi; custom-character represents a null character; and a finite set Σi of the input characters of the finite state automaton Mi is equivalent to the finite set Ti of the terminal symbols of the Tibetan spelling formal grammar Gi; and the initial state qi of the finite state automaton Mi is equivalent to the start symbol Si of the Tibetan spelling formal grammar Gi.


Wherein, the process of acquiring the Tibetan spelling formal grammar includes: acquiring the finite set Ti of the terminal symbols, wherein Ti is a subset of the set L, and the set L includes 30 Tibetan consonants, 5 reverse scripts, 4 vowel symbols and 1 long vowel symbol; acquiring the finite set Vi of the non-terminal symbols; acquiring the start symbol Si, wherein SiεVi; acquiring the finite set Pi of the production rules; and acquiring the corresponding Tibetan spelling formal grammar Gi according to the Ti, Vi, Si and Pi. Wherein, the process of acquiring the finite set Pi of the production rules can include: at first, acquiring a preset Tibetan spelling grammar formal description system; and then acquiring the finite set Pi of the production rules according to the Tibetan spelling grammar formal description system.


In the embodiment, the preset Tibetan spelling grammar formal description system can be established according to a set theory method, and the specific form is as follows:


Tibetan spelling grammar 1: elements in a set Root={b1, b2, b3, b4, b5, . . . , b30, b31, b31, b31, b34, b35} respectively correspond to 30 Tibetan consonants and 5 Tibetan reverse scripts, and then any Tibetan character corresponding to biε Root can constitute a root of a Tibetan character.


Tibetan spelling grammar 2: for a set Prefix={b3, b11, b15, b16, b23}, Prefix⊂Root, any Tibetan character corresponding to biε Prefix, (j=3, 11, 15, 16, 23) can constitute a prefix of the Tibetan character.


Tibetan spelling grammar 3: for a set Suffix={b3, b4, b11, b12, b15, b16, b23, b25, b26, b28}, Suffix⊂Root, any Tibetan character corresponding to biεSuffix, (j=3, 4, 11, 12, 15, 16, 23, 25, 26, 28) can constitute a suffix of the Tibetan character.


Tibetan spelling grammar 4: for a set Postfix={b11, b28}, Postfix⊂Suffix⊂Root, any Tibetan character corresponding to biεPostfix, (j=11, 28) can constitute a postfix of the Tibetan character.


Tibetan spelling grammar 5: for a set Superfix={b25, b26, b28}, Superfix⊂Root, any Tibetan character corresponding to biεSuperfix, (j=25, 26, 28) can constitute a superfix of the Tibetan character.


Tibetan spelling grammar 6: for a set Subfix={b20, b24, b25, b26}, Subfix⊂Root, any Tibetan character corresponding to biεSubfix, (j=20, 24, 25, 26) can constitute a subfix of the Tibetan character.


Tibetan spelling grammar 7: for a set Vowel=Vowel1{a}, Vowel1={i, u, e, o} corresponds to 4 Tibetan vowel characters, and a represents a Tibetan long vowel character. The Tibetan roots corresponding to bjεRoot, (j=1, 23, 5, 7, . . . , 33, 34, 35) can be spelled with vowel characters corresponding to vεVowel, u and a can only be spelled below consonants, and the rest 3 vowel characters can only be spelled above consonants.


Tibetan spelling grammar 8: when the Tibetan roots corresponding to bjεRoot, (j=1, 3, 4, 5, 7, 8, 9, 11, 12, 13, 15, 16, 17, 19, 29) are spelled with the superfixes corresponding to biεSuperfix, (i=25, 26, 28), the following grammar rules must be satisfied:


1. bjεRoot, (j=1, 3, 4, 7, 8, 9, 11, 12, 15, 16, 17, 19) can only be spelled with b25εSuperfix.


2. bjεRoot, (j=1, 3, 4, 5, 7, 9, 11, 13, 15, 29) can only be spelled with b26εSuperfix.


3. bjεRoot, (j=1, 3, 4, 8, 9, 11, 12, 13, 15, 16, 17) can only be spelled with b28εSuperfix.


Tibetan spelling grammar 9: when the Tibetan roots corresponding to bjεRoot, (j=1, 2, 3, 8, 9, 10, 11, 13, 14, 15, 16, 18, 21, 22, 25, 26, 27, 28, 29) are spelled with the subfixes corresponding to biεSubfix, (i=20, 24, 25, 26), the following grammar rules must be satisfied:


1. bjεRoot, (j=1, 2, 3, 8, 11, 18, 21, 22, 25, 26, 27, 29) can only be spelled with b20εSubfix.


2. bjεRoot, (j=1, 2, 3, 13, 14, 15, 16) can only be spelled with b24εSubfix.


3. bjεRoot, (j=1, 2, 3, 9, 10, 11, 13, 14, 15, 16, 28, 29) can only be spelled with b25εSubfix.


4. bjεRoot, (j=1, 3, 15, 22, 25, 28) can only be spelled with b26εSubfix.


5. bjεRoot, (j=29) can only be spelled with b14εSubfix.


(Note: to spell the [f] phonetic symbol in other languages, and b29 and b14 spelling forms occur in the modern Tibetan. According to the traditional Tibetan spelling grammar, b29 cannot be used as the superfix, and b14 cannot be used as the subfix either, therefore, as a special condition, when b29 is spelled with b14, b14 is deemed as the “subfix”.)


Tibetan spelling grammar 10: when the Tibetan roots corresponding to biεRoot, (i=1, 3, 12, 13, 15, 16, 17) are simultaneously spelled with the superfixes corresponding to bjεSuperfix, (j=25, 28) and the subfixes corresponding to bkεSubfix, (k=20, 24, 25), the following grammar rules must be satisfied:


1. when being spelled with b25εSuperfix, biεRoot can be simultaneously spelled with b24εSubfix; and when being spelled with b28εSuperfix, biεRoot can be simultaneously spelled with bkεSubfix, (k=24, 25).


2. When being spelled with b25εSuperfix, b3εRoot can be simultaneously spelled with b24εSubfix; and when being spelled with b28εSuperfix, b3εRoot can be simultaneously spelled with bkεSubfix, (k=24, 25).


3. When being spelled with b28εSuperfix, b12εRoot can be simultaneously spelled with b25εSubfix.


4. When being spelled with b28εSuperfix, b13εRoot can be simultaneously spelled with bkεSubfix, (k=24, 25).


5. When being spelled with b28εSuperfix, b15εRoot can be simultaneously spelled with bkεSubfix, (k=24, 25).


6. When being spelled with b25εSuperfix, b16εRoot can be simultaneously spelled with b24εSubfix; and when being spelled with b28εSuperfix, b16εRoot can be simultaneously spelled with bkεSubfix, (k=24, 25).


7. When being spelled with b25εSuperfix, b17εRoot can be simultaneously spelled with b20εSubfix.


Tibetan spelling grammar 11: when the Tibetan roots corresponding to biεRoot, (i=1, 3, 4, 7, 8, 9, 11, 12, 17, 19) are simultaneously spelled with the prefixes corresponding to b15εPrefix and the superfixes corresponding to bjεSuperfix, (j=25, 26, 28), the following grammar rules must be satisfied:


1. biεRoot, (i=1, 3, 4, 7, 8, 9, 11, 12, 17, 19) can be spelled with b25εSuperfix.


2. biεRoot, (i=9,11) can be spelled with b26εSuperfix.


3. biεRoot, (i=1, 3, 4, 8, 9, 11, 12, 17) can be spelled with b28εSuperfix.


Tibetan spelling grammar 12: when the Tibetan roots corresponding to biεRoot, (i=1, 2, 3, 11, 13, 14, 15, 16, 22, 25, 28) are simultaneously spelled with the prefixes corresponding to biεPrefix, (j=11, 15, 16, 23) and the subfixes corresponding to bkεSubfix, (k=20, 24, 25, 26), the following grammar rules must be satisfied:


1. biεRoot, (i=1, 3, 13, 15, 16) can be spelled with b11εPrefix and b24εSubfix.


2. biεRoot, (i=1, 3, 13, 15) can be spelled with b11εPrefix and b25εSubfix.


3. biεRoot, (i=1, 3) can be spelled with b15εPrefix and b24εSubfix.


4. biεRoot, (i=1, 3, 28) can be spelled with b15εPrefix and b25εSubfix.


5. biεRoot, (i=1, 22, 25, 28) can be spelled with b15εPrefix and b26εSubfix.


6. biεRoot, (i=2, 3) can be spelled with b16εPrefix and bkεSubfix, (k=24,25).


7. biεRoot, (i=2, 3, 14, 15) can be spelled with b23εPrefix and b24εSubfix.


8. biεRoot, (i=2, 3, 11, 14, 15) can be spelled with b23εPrefix and b25εSubfix.


Tibetan spelling grammar 13: when the Tibetan roots corresponding to biεRoot, (i=1, 3) are spelled with the prefixes corresponding to b15εPrefix, the superfixes corresponding to bj εSuperfix, (i=25, 28) and the subfixes corresponding to bkεSubfix, (i=24, 25), the following grammar rules must be satisfied:


1. biεRoot, (i=1, 3) can be spelled with b15εPrefix, b25εSuperfix and b24εSubfix.


2. biεRoot, (i=1, 3) can be spelled with b15εPrefix, b28εSuperfix and b25εSubfix.


3. biεRoot, (i=1,3) can be spelled with bisεPrefix, b28εSuperfix and b24εSubfix.


Tibetan spelling grammar 14: when being spelled with the prefixes corresponding to bjεPrefix, (j=3, 11, 15, 16, 23), the Tibetan roots corresponding to biεRoot, (i=1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 24, 27, 28) must be simultaneously spelled with the vowel symbols corresponding to vεVowel, Vowel={i, u, e, o}, or one suffix corresponding to bkεSuffix, (k=3, 4, 11, 12, 15, 16, 23, 25, 26, 28), and the following grammar rules must be satisfied:


1. biεRoot, (i=5, 8, 9, 11, 12, 17, 21, 22, 24, 27, 28) can only be spelled with b3εPrefix.


2. biεRoot, (i=1, 3, 4, 13, 15, 16) can only be spelled with b11εPrefix.


3. biεRoot, (i=1, 3, 5, 9, 11, 17, 21, 22, 27, 28) can only be spelled with b15εPrefix.


4. biεRoot, (i=2, 3, 4, 6, 7, 8, 10, 11, 12, 18, 19) can only be spelled with b16εPrefix.


5. biεRoot, (i=2, 3, 6, 7, 10, 11, 14, 15, 18, 19) can only be spelled with b23εPrefix.


Tibetan spelling grammar 15: the Tibetan roots corresponding to bjεRoot, (j=1, 2, 3, 4, 5, 6, 7, 8, 9, 10, . . . , 21, 22, 23, 24, 25, 26, 27, 28, 29, 30) can be spelled with any suffix corresponding to biεSuffix, (i=3, 4, 11, 12, 15, 16, 23, 25, 26, 28).


Tibetan spelling grammar 16: the use of the Tibetan postfixes is only related to the suffixes. The Tibetan suffixes corresponding to biεSuffix, (i=3, 4, 12, 15, 16, 25, 26) can be spelled with the postfixes corresponding to bjεPostfix, (j=11,28), and the following grammar rules must be satisfied:


1. b11εPostfix can only be spelled with biεSuffix, (i=12, 25, 26).


2. b28εPostfix can only be spelled with biεSuffix, (i=3, 4, 15, 16).


Tibetan spelling grammar 17: when being spelled with the Tibetan subfixes corresponding to bjεSubfix, (j=24, 25), the Tibetan roots corresponding to biεRoot, (i=3, 11, 14) can be simultaneously spelled with the Tibetan subfixes corresponding to b20εSubfix. The specific rules are as follows:


1. when being spelled with b25εSubfix, biεRoot, (i=3,11) can be simultaneously spelled with b20εSubfix.


2. When being spelled with b24εSubfix, b14εRoot can be simultaneously spelled with b20εSubfix.


Tibetan spelling grammar 18: the Tibetan consonants corresponding to b29εRoot can be spelled with the Tibetan consonants corresponding to b14εRoot, and b14εRoot is correspondingly located below b29εRoot.


Tibetan spelling grammar 19: when being spelled with the Tibetan consonants corresponding to b14εRoot, the Tibetan consonants corresponding to b29εRoot can be simultaneously spelled with the Tibetan suffixes corresponding to bi εSuffix, (i=3, 4, 11, 12, 15, 16, 23, 25, 26, 28).


Tibetan spelling grammar 20: the Tibetan characters having no suffix can be spelled with the Tibetan consonants corresponding to b23εRoot, and at this time, the Tibetan consonants corresponding to b23εRoot must be spelled with the vowel symbols (i, e, u, o) corresponding to vεVowel, Vowel={i, u, e, o}.


Tibetan spelling grammar 21: besides the special spelling in the grammars 17, 18, 19 and 20, the Tibetan characters are spelled according to the sequence of the prefixes, the superfixes, the roots, the subfixes, the vowel symbols, the suffixes and the postfixes.


In the embodiment, Ti represents the finite set of the terminal symbols of the Tibetan spelling formal grammar Gi; Si represents the start symbol of the Tibetan spelling formal grammar Gi; SiεVi;custom-character represents a null character; the finite set Σi of the input characters of the finite state automaton Mi is equivalent to the finite set Ti of the terminal symbols of the Tibetan spelling formal grammar Gi; and the initial state qi of the finite state automaton Mi is equivalent to the start symbol Si of the Tibetan spelling formal grammar Gi. Wherein, Si represents any possible sentence (it is a Tibetan character in the application herein) in the language L (Gi) generated by the grammar Gi, so Si is a special non-terminal symbol.


Specifically, the specific forms of the 24 Tibetan spelling formal grammars G1 to G24 are as follows:


Tibetan spelling formal grammar G1: the spelling formal grammar G1 of the Tibetan roots and the vowel symbols is a quadruple (T1, V1, S1, P1), wherein:


(1) terminal symbol


T1=TB∪To, wherein:


TB={b1, b2, b3, b4, b5, . . . , b35}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o, a}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V1={S1, B1,1, B1,2};


(3) S1 is a non-terminal symbol in V1 and is a start symbol; and


(4) a production set of the grammar G1 is: P1={


S1→b1|b2|b3|b4|b5| . . . |b30|b31|b32|b33|b34|b35,


S1→b1B1,1|b2B1,1|b3B1,1|b4B1,1|b5B1,1| . . . |b30B1,1,


S1→b31B1,2|b32B1,2|b33B1,2|b34B1,2|b35B1,2,


B1,1→i|u|e|o|a,


B1,2→i|u|e|o}


With respect to a Tibetan spelling structure 2:


Tibetan spelling formal grammar G2: the spelling formal grammar G2 of the Tibetan superfixes, the roots and the vowels is a quadruple (T2, V2, S2, P2), wherein:


(1) terminal symbol


T2=TB∪To, wherein:


TB={b1, b3, b4, b5, b7, b8, b9, b11, b12, b13, b15, b16, b17, b19, b25, b26, b28, b29}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V2={S2, B2,1, B2,2, B2,3, B2,4};


(3) S2 is a non-terminal symbol in V2 and is the start symbol;


(4) the production set of the grammar G2 is: P2={


S2→b25B2,1|b26B2,2|b28B2,3,


B2,1→b1|b3|b4|b7|b8|b9|b11|b12|b15|b16|b17|b19,


B2,1→b1B2,4|b3B2,4|b4B2,4|b7B2,4|b8B2,4|b9B2,4|b11B2,4|b12B2,4|b15B2,4|b16B2,4|b17B2,4|b19B2,4,


B2,2→b1|b3|b4|b5|b7|b9|b11|b13|b15|b29,


B2,2→b1B2,4|b3B2,4|b4B2,4|b5B2,4|b7B2,4|b9B2,4|b11B2,4|b13B2,4|b15B2,4|b29B2,4,


B2,3→b1|b3|b4|b8|b9|b11|b12|b13|b15|b16|b17,


B2,3→b1B2,4|b3B2,4|b4B2,4|b8B2,4|b9B2,4|b11B2,4|b12B2,4|b13B2,4|b15B2,4|b16B2,4|b17B2,4,


B2,4→i|u|e|o}


With respect to a Tibetan spelling structure 3:


Tibetan spelling formal grammar G3: the spelling formal grammar G3 of the Tibetan roots, the subfixes and the vowel symbols is a quadruple (T3, V3, S3, P3), wherein:


(1) terminal symbol


T3=TB∪To, wherein:


TB{b1, b2, b3, b8, b9, b10, b11, b13, b14, b15, b16, b18, b20, b21, b22, b24, b25, b26, b27, b28, b29}, the elements thereof correspond to the Tibetan consonant characters; and T0={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V3={S3, B3,1, B3,2, B3,3, B3,4, B3,5, B3,6, B3,7, B3,8, B3,9, B3,10};


(3) S3 is a non-terminal symbol in V3 and is the start symbol; and


(4) the production set of the grammar G3 is: P3={


S3→b1B3,1|b3B3,1,


S3→b2B3,2,


S3→b11B3,3|b29B3,3,


S3→b8B3,4|b18B3,4|b21B3,4|b26B3,4|b27B3,4,


S3→b9B3,5|b10B3,5,


S3→b13B3,6|b14B3,6|b16B3,6,


S3→b22B3,7|b25B3,7,


S3→b28B3,8,


S3→b15B3,9,


B3,1→b20|b24|b25|b26,


B3,1→b20B3,10|b24B3,10|b25B3,10|b26B3,10,


B3,2→b20|b24|b25,


B3,2→b20B3,10|b24B3,10|b25B3,10,


B3,3→b20|b25,


B3,3→b20B3,10|b25B3,10,


B3,4→b20,


B3,4→b20B3,10,


B3,5→b25,


B3,5→b25B3,10,


B3,6→b24|b25,


B3,6→b24B3,10|b25B3,10,


B3,7→b20|b26,


B3,7→b20B3,10|b26B3,10,


B3,8→b25|b26,


B3,8→b25B3,10|b26B3,10,


B3,9→b24|b25|b26,


B3,9→b24B3,10|b25B3,10|b26B3,10,


B3,10→i|u|e|o}


With respect to a Tibetan spelling structure 4:


Tibetan spelling formal grammar G4: the spelling formal grammar G4 of the superfixes, the Tibetan roots, the subfixes and the vowel symbols is a quadruple (T4, V4, S4, P4, wherein:


(1) terminal symbol


T4=TB∪To, wherein TB={b1, b3, b12, b13, b15, b16, b17, b20, b24, b25, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V4={S4, B4,1, B4,2, B4,3, B4,4, B4,5, B4,6B4,7};


(3) S4 is a non-terminal symbol in V4 and is the start symbol; and


(4) the production set of the grammar G4 is: P4={


S4→b25B4,1,


S4→b28B4,2,


B4,1→b1B4,3|b3B4,3|b16B4,3,


B4,1→b17B4,4,


B4,2→b1B4,5|b3B4,5|b13B4,5|b15B4,5|b16B4,5,


B4,2→b12B4,6,


B4,3→b24,


B4,3→b24B4,7,


B4,4→b20,


B4,4→b20B4,7,


B4,5→b24|b25,


B4,5→b24B4,7|b25B4,7,


B4,6→b25,


B4,6→b25B4,7,


B4,7→i|u|e|o}


With respect to a Tibetan spelling structure 5:


Tibetan spelling formal grammar G5: the spelling formal grammar G5 of the Tibetan prefixes, the superfixes, the roots and the vowel symbols is a quadruple (T5, V5, S5, P5), wherein:


(1) terminal symbol


T5=TB∪To, wherein:


TB={b1, b3, b4, b7, b8, b9, b11, b12, b15, b17, b19, b25, b26, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V5={S5, B5,1, B5,2, B5,3, B5,4, B5,5};


(3) S5 is a non-terminal symbol in V5 and is the start symbol; and


(4) the production set of the grammar G5 is: P5={


S5→b15B5,1,


B5,1→b28B5,2,


B5,1→b26B5,3,


B5,1→b25B5,4,


B5,2→b1|b3|b4|b8|b9|b11|b12|b17,


B5,2→b1B5,5|b3B5,5|b4B5,5|b8B5,5|b9B5,5|b11B5,5|b12B5,5|b17B5,5,


B5,3→b9|b11,


B5,3→b9B5,5|b11B5,5;


B5,4→b1|b3|b4|b7|b8|b9|b11|b12|b17|b19,


B5,4→b1B5,5|b3B5,5|b4B5,5|b7B5,5|b8B5,5|b9B5,5|b11B5,5|b12B5,5|b17B5,5|b19B5,5,


B5,5→i|u|e|o}


With respect to a Tibetan spelling structure 6:


Tibetan spelling formal grammar G6: the spelling formal grammar G6 of the Tibetan prefixes, the roots, the subfixes and the vowel symbols is a quadruple (T6, V6, S6, P6), wherein:


(1) terminal symbol


T6=TB∪To, wherein:


TB={b1, b2, b3, b11, b13, b14, b15, b16, b22, b23, b24, b25, b26, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V6={S6, B6,1, B6,2, B6,3, B6,4, B6,5, B6,6, B6,7, B6,8, B6,9, B6,10, B6,11};


(3) S6 is a non-terminal symbol in V6 and is the start symbol; and


(4) the production set of the grammar G6 is: P6={


S6→b11B6,1|b15B6,2|b16B6,3|b23B6,4,


B6,1→b16B6,5,


B6,1→b1B6,9|b3B6,9|b13B6,9|b15B6,9,


B6,2→b1B6,6,


B6,2→b22B6,7|b25B6,7,


B6,2→b28B6,8,


B6,2→b3B6,9,


B6,3→b2B6,9|b3B6,9,


B6,4→b2B6,9|b3B6,9|b14B6,9|b15B6,9,


B6,4→b11B6,10,


B6,5→b24,


B6,5→b24B6,11,


B6,6→b24|b25|b26,


B6,6→b24B6,11|b25B6,11|b26B6,11,


B6,7→b26,


B6,7→b26B6,11,


B6,8→b25|b26,


B6,8→b25B6,11|b26B6,11,


B6,9→b24|b25,


B6,9→b24B6,11|b25B6,11,


B6,10→b25,


B6,10→b25B6,11,


B6,11→i|u|e|o}


With respect to a Tibetan spelling structure 7:


Tibetan spelling formal grammar G7: the spelling formal grammar G7 of the Tibetan prefixes, the superfixes, the roots, the subfixes and the vowel symbols is a quadruple (T7, V7, S7, P7), wherein:


(1) terminal symbol


T7=TB∪To, wherein:


TB={b1, b3, b15, b24, b25, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V7{S7, B7,1, B7,2, B7,3, B7,4, B7,5, B7,6};


(3) S7 is a non-terminal symbol in V7 and is the start symbol; and


(4) the production set of the grammar G7 is: P7={


S7→b15B7,1,


B7,1→b28B7,2,


B7,1→b25B7,3,


B7,2→b1B7,4|b3B7,4,


B7,3→b1B7,5|b3B7,5,


B7,4→b24|b25,


B7,4→b24B7,6|b25B7,6,


B7,5→b24,


B7,5→b24B7,6,


B7,6→i|u|e|o}


With respect to a Tibetan spelling structure 8:


Tibetan spelling formal grammar G8: the spelling formal grammar G8 of the Tibetan prefixes, the roots and the vowel symbols is a quadruple (T8, V8, S8, P8), wherein:


(1) terminal symbol


T8=TB∪To, wherein:


TB={b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15, b16, b17, b18, b19, b21, b22, b23, b24, b27, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V8={S8, B8,1, B8,2, B8,3, B8,4, B8,5, B8,6};


(3) S8 is a non-terminal symbol in V8 and is the start symbol; and


(4) the production set of the grammar G8 is: P8={


S8→b3B8,1|b11B8,2|b15B8,3|b16B8,4|b23B8,5,


B8,1→b5B8,6|b8B8,6|b9B8,6|b11B8,6|b12B8,6|b17B8,6|b21B8,6|b22B8,6|b24B8,6|b27B8,6|b28B8,6,


B8,2→b1B8,6|b3B8,6|b4B8,6|b13B8,6|b15B8,6|b16B8,6,


B8,3→b1B8,6|b3B8,6|b5B8,6|b9B8,6|b11B8,6|b17B8,6|b21B8,6|b22B8,6|b27B8,6|b28B8,6,


B8,4→b2B8,6|b3B8,6|b4B8,6|b6B8,6|b7B8,6|b8B8,6|b10B8,6|b11B8,6|b12B8,6|b18B8,6|b19B8,6,


B8,5→b2B8,6|b3B8,6|b6B8,6|b7B8,6|b10B8,6|b11B8,6|b14B8,6|b15B8,6|b18B8,6|b19B8,6,


B8,6→i|u|e|o}


With respect to a Tibetan spelling structure 9:


Tibetan spelling formal grammar G9: the spelling formal grammar G9 of the Tibetan prefixes, the roots, the vowel characters and the suffixes is a quadruple (T9, V9, S9, P9), wherein:


(1) terminal symbol


T9=TB∪To, wherein:


TB={b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15, b16, b17, b18, b19, b21, b22, b23, b24, b25, b26, b27, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V9={S9, B9,1, B9,2, B9,3, B9,4, B9,5, B9, B9,7};


(3) S9 is a non-terminal symbol in V9 and is the start symbol; and


(4) the production set of the grammar G9 is: P9={


S9→b3B9,1|b11B9,2|b15B9,3|b16B9,4|b23B9,5,


B9,1→b5B9,7|b8B9,7|b9B9,7|b11B9,7|b12B9,7|b17B9,7|b21B9,7|b22B9,7|b24B9,7|b27B9,7|b28B9,7,


B9,1→b5B9,6|b8B9,6|b9B9,6|b11B9,6|b12B9,6|b17B9,6|b21B9,6|b22B9,6|b24B9,6|b27B9,6|b28B9,6,


B6,2→b1B9,7|b3B9,7|b4B9,7|b13B9,7|b15B9,7|b16B9,7,


B9,2→b1B9,6|b3B9,6|b4B9,6|b13B9,6|b15B9,6|b16B9,6,


B9,3→b1B9,7|b3B9,7|b5B9,7|b9B9,7|b11B9,7|b17B9,7|b21B9,7|b22B9,7|b27B9,7|b28B9,7,


B9,3→b1B9,6|b3B9,6|b5B9,6|b9B9,6|b11B9,6|b17B9,6|b21B9,6|b22B9,6|b27B9,6|b28, B9,6,


B9,4→b2B9,7|b3B9,7|b4, B9,7|b6B9,7|b7B9,7|b8B9,7|b10B9,7|b11B9,7|b12B9,7|b18B9,7|b19B9,7,


B9,4→b2B9,6|b3B9,6|b4B9,6|b6B9,6|b7B9,6|b8B9,6|b10B9,6|b11B9,6|b12B9,6|b18B9,6|b19B9,6,


B9,5→b2B9,7|b3B9,7|b6B9,7|b7B9,7|b10B9,7|b11B9,7|b14B9,7|b15B9,7|b18B9,7|b19B9,7,


B9,5→b2B9,6|b3B9,6|b6B9,6|b7B9,6|b10B9,6|b11B9,6|b14B9,6|b15B9,6|b18B9,6|b19B9,6,


B9,6→iB9,7|uB9,7|eB9,7|oB9,7,


B9,7→b3|b4|b11|b12|b15|b16|b23|b25|b26|b28}


With respect to a Tibetan spelling structure 10:


Tibetan spelling formal grammar G10: the spelling formal grammar G10 of the Tibetan prefixes, the superfixes, the roots, the vowel symbols and the suffixes is a quadruple (T10, V10, S10, P10), wherein:


(1) terminal symbol


T10=TB∪To, wherein:


TB={b1, b3, b4, b7, b9, b11, b12, b15, b16, b17, b19, b23, b25, b26, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V10={S10, B10,1, B10,2, B10,3, B10,4, B10,5, B10,6};


(3) S10 is a non-terminal symbol in V10 and is the start symbol; and


(4) the production set of the grammar G10 is: P10={


B10,1→b28B10,2|b26B10,3|b25B10,4,


B10,2→b1B10,6|b3B10,6|b4B10,6|b8B10,6|b9B10,6|b11B10,6|b12B10,6|b17B10,6,


B10,2→b1B10,5|b3B10,5|b4B10,5|b8B10,5|b9B10,5|b11B10,5|b12B10,5|b17B10,5,


B10,3→b9B10,6|b11B10,6,


B10,3→b9B10,5|b11B10,5,


B10,4→b1B10,6|b3B10,6|b4B10,6|b7B10,6|b8B10,6|b9B10,6|b11B10,6|b12B10,6|b17B10,6|b19B10,6,


B10,4→b1B10,5|b3B10,5|b4B10,5|b7B10,5|b8B10,5|b9B10,5|b11B10,5|b12B10,5|b17B10,5|b19B10,5,


B10,5→iB10,6|uB10,6|eB10,6|oB10,6,


B10,6→b3|b4|b11|b12|b15|b16|b23|b25|b26|b28}


With respect to a Tibetan spelling structure 11:


Tibetan spelling formal grammar G11: the spelling formal grammar G11 of the Tibetan prefixes, the roots, the subfixes, the vowel symbols and the suffixes is a quadruple (T11, V11, S11, P11), wherein:


(1) terminal symbol


T11=TB∪To, wherein:


TB={b1, b2, b3, b4, b11, b12, b13, b14, b15, b16, b22, b23, b24, b25, b26, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V11={S11, B11,1, B11,2, B11,3, B11,4, B11,5, B11,6, B11,7, B11,8, B11,9, B11,10, B11,11, B11,12};


(3) S11 is a non-terminal symbol in V11 and is the start symbol; and


(4) the production set of the grammar G11 is: P11={


S11→b11B11,1|b15B11,2|b16B11,3|b23B11,4,


B11,1→b16B11,5,


B11,1→b1B11,9|b3B11,9|b13B11,9|b15B11,9,


B11,2→b1B11,6,


B11,2→b22B11,7|b25B11,7,


B11,2→b28B11,8,


B11,2→b3B11,9,


B11,3→b2B11,9|b3B11,9,


B11,4→b2B11,9|b3B11,9|b14B11,9|b15B11,9,


B11,4→b11B11,10,


B11,5→b24B12,


B11,5→b24B11,11,


B11,6→b24B11,12|b25B11,12|b26B11,12,


B11,6→b24B11,11|b25B11,11|b26B11,11,


B11,7→b26B11,12,


B11,7→b26B11,11,


B11,8→b25B11,12|b26B11,12,


B11,8→b25B11,11|b26B11,11,


B11,9→b24B11,12|b25B11,12,


B11,9→b24B11,11|b25, B11,11,


B11,10→b25B11,12,


B11,10→b25B11,11,


B11,11→iB11,12|uB11,12|eB11,12|oB11,12,


B11,12→b3|b4|b11|b12|b15|b16|b23|b25|b26|b28}


With respect to a Tibetan spelling structure 12:


Tibetan spelling formal grammar G12: the spelling formal grammar G12 of the Tibetan prefixes, the superfixes, the roots, the subfixes, the vowel symbols and the suffixes is a quadruple (T12, V12, S12, P12), wherein:


(1) terminal symbol


T12=TB∪To, wherein:


TB={b1, b3, b4, b11, b12, b15, b16, b23, b24, b25, b26, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V12={S12, B12,1, B12,2, B12,3, B12,4, B12,5, B12,6, B12,7};


(3) S12 is a non-terminal symbol in V12 and is the start symbol; and


(4) the production set of the grammar G12 is: P12={


S12→b15B12,1,


B12,1→b28B12,2,


B12,1→b25B12,3,


B12,2→b1B12,4|b3B12,4,


B12,3→b1B12,5|b3B12,5,


B12,4→b24B12,7|b25B12,7,


B12,4→b24B12,6|b25B12,6,


B12,5→b24B12,7,


B12,5→b24B12,6,


B12,6→iB12,7|uB12,7|eB12,7|oB12,7,


B12,7→b3|b4|b11|b12|b15|b16|b23|b25|b26|b28}


With respect to a Tibetan spelling structure 13:


Tibetan spelling formal grammar G13: the spelling formal grammar G13 of the Tibetan prefixes, the roots, the vowel symbols, the suffixes and the postfixes is a quadruple (T13, V13, S13, P13), wherein:


(1) terminal symbol


T13=TB∪To, wherein:


TB={b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b12, b13, b14, b15, b16, b17, b18, b19, b21, b22, b23, b24, b25, b26, b27, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V13={S13, B13,1, B13,2, B13,3, B13,4, B13,5, B13,6, B13,7, B13,8, B13,9};


(3) S13 is a non-terminal symbol in V13 and is the start symbol; and


(4) the production set of the grammar G13 is: P13={


S13→b3B13,1|b11B13,2|b15B13,3|b16B13,4|b23B13,5,


B13,1→b5B13,6|b8B13,6|b9B13,6|b11B13,6|b12B13,6|b17B13,6|b21B13,6|b22B13,6|b24B13,6|b27B13,6|b28B13,6,


B13,2→b1B13,6|b3B13,6|b4B13,6|b13B13,6|b15B13,6|b16B13,6,


B13,3→b1B13,6|b3B13,6|b5B13,6|b9B13,6|b11B13,6|b17B13,6|b21B13,6|b22B13,6|b27B13,6|b28B13,6,


B13,4→b2B13,6|b3B13,6|b4B13,6|b6B13,6|b7B13,6|b8B13,6|b10B13,6|b11B13,6|b12B13,6|b18B13,6|b19B13,6,


B13,5→b2B13,6|b3B13,6|b6B13,6|b7B13,6|b10B13,6|b11B13,6|b14B13,6|b15B13,6|b18B13,6|b19B13,6,


B13,6→iB13,7|uB13,7|eB13,7|oB13,7,


B13,6→b3B13,8|b4B13,8|b15B13,8|b16B13,8,


B13,6→b12B13,9|b25B13,9|b26B13,9,


B13,7→b3B13,8|b4B13,8|b15B13,8|b16B13,8,


B13,7→b12B13,9|b25B13,9|b26B13,9,


B13,8→b28,


B13,9→b11}


With respect to a Tibetan spelling structure 14:


Tibetan spelling formal grammar G14: the spelling formal grammar G14 of the Tibetan prefixes, the superfixes, the roots, the vowel symbols, the suffixes and the postfixes is a quadruple (T14, V14, S14, P14), wherein:


(1) terminal symbol


T14=TB∪To, wherein:


TB={b1, b3, b4, b11, b12, b13, b15, b16, b17, b20, b24, b25, b26, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V14={S14, B14,1, B14,2, B14,3, B14,4, B14,5, B14,6, B14,7, B14,8};


(3) S14 is a non-terminal symbol in V14 and is the start symbol; and


(4) the production set of the grammar G14 is: P14={


S14→b15B14,1,


B14,1→b28B14,2|b26B14,3|b25B14,4,


B14,2→b1B14,5|b3B14,5|b4B14,5|b8B14,5|b9B14,5|b11B14,5|b12B14,5|b17B14,5,


B14,3→b9B14,5|b11B14,5,


B14,4→b1B14,5|b3B14,5|b4B14,5|b7B14,5|b8B14,5|b9B14,5|b11B14,5|b12B14,5|b17B14,5|b19B14,5,


B14,5→iB14,6|uB14,6|eB14,6|oB14,6,


B14,5→b3B14,7|b4B14,7|b15B14,7|b16B14,7,


B14,5→b12B14,8|b25B14,8|b26B14,8,


B14,6→b3B14,7|b4B14,7|b15B14,7|b16B14,7,


B14,6→b12B14,8|b25B14,8|b26B14,8,


B14,7→b28,


B14,8→b11}


With respect to a Tibetan spelling structure 15:


Tibetan spelling formal grammar G15: the spelling formal grammar G15 of the Tibetan prefixes, the roots, the subfixes, the vowel symbols, the suffixes and the postfixes is a quadruple (T15, V15, S15, P15), wherein:


(1) terminal symbol


T15=TB∪To, wherein:


TB{b1, b2, b3, b4, b11, b12, b13, b14, b15, b16, b22, b23, b24, b25, b26, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V15={S15, B15,1, B15,2, B15,3, B15,4, B15,5, B15,6, B15,7, B15,8, B15,9, B15,10, B15,11, B15,12, B15,13, B15,14};


(3) S15 is a non-terminal symbol in V15 and is the start symbol; and


(4) the production set of the grammar G15 is: P15={


S15→b11B15,1|b15B15,2|b16B15,3|b23B15,4,


B15,1→b16B15,5,


B15,1→b1B15,9|b3B15,9|b13B15,9|b15B15,9,


B15,2→b1B15,6,


B15,2→b22B15,7|b25B15,7,


B15,2→b28B15,8,


B15,2→b3B15,9,


B15,3→b2B15,9|b3B15,9,


B15,4→b2B15,9|b3B15,9|b14B15,9|b15B15,9,


B15,4→b11B15,10,


B15,5→b24B15,11,


B15,6→b24B15,11|b25B15,11|b26B15,11,


B15,7→b26B15,11,


B15,8→b25B15,11|b26B15,11,


B15,9→b24B15,11|b25B15,11,


B15,10→b25B15,11,


B15,11→iB15,12|uB15,12|eB15,12|oB15,12,


B15,11→b3B15,13|b4B15,13|b15B15,13|b16B15,13,


B15,11→b12B15,4|b25B15,14|b26B15,14,


B15,12→b3B15,13|b4B15,13|b15B15,13|b16B15,13,


B15,12→b12B15,14|b25B15,14|b26B15,14,


B15,13→b28,


B15,14→b11}


With respect to a Tibetan spelling structure 16:


Tibetan spelling formal grammar G16; the Tibetan character spelling grammar G16 of the Tibetan prefixes, the superfixes, the roots, the subfixes, the vowel symbols, the suffixes and the postfixes is a quadruple (T16, V16, S16, P16), wherein:


(1) terminal symbol


T16=TB∪To, wherein:


TB{b1, b3, b4, b11, b12, b15, b16, b24, b25, b26, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V16={S16, B16,1, B16,2, B16,3, B16,4, B16,5, B16,6, B16,7, B16,8, B16,9};


(3) S16 is a non-terminal symbol in V16 and is the start symbol; and


(4) the production set of the grammar G16 is: P16={


S16→b15B16,1,


B16,1→b28B16,2,


B16,1→b25B16,3,


B16,2→b1B16,4|b3B16,4,


B16,3→b1B16,5|b3B16,5,


B16,4→b24B16,6|b25B16,6,


B16,5→b24B16,6,


B16,6→iB16,7|uB16,7|eB16,7|oB16,7,


B16,6→b3B16,8|b4B16,8|b15B16,8|b16B16,8,


B16,6→b12B16,9|b25B16,9|b26B16,9,


B16,7→b3B16,8|b4B16,8|b15B16,8|b16B16,8,


B16,7→b12B16,9|b25B16,9|b26B16,9,


B16,8→b28,


B16,9→b11}


With respect to a Tibetan spelling structure 17:


Tibetan spelling formal grammar G17: the spelling formal grammar G17 of the Tibetan roots, the vowel symbols and the suffixes is a quadruple (T17, V17, S17, P17), wherein:


(1) terminal symbol


T17=TB∪To, wherein:


TB={b1, b2, b3, b4, b5, . . . , b30}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V17={S17, B17,1, B17,2};


(3) S17 is a non-terminal symbol in V17 and is the start symbol; and


(4) the production set of the grammar G17 is: P17={


S17→b1B17,1|b2B17,1|b3B17,1|b4B17,1|b5B17,1| . . . |b30B17,1,


S17→b1B17,2|b2B17,2|b3B17,2|b4B17,2|b5B17,2| . . . |b30B17,2,


B17,1→|iB17,2|uB17,2|eB17,2|oB17,2,


B17,2→b3|b4|b11|b12|b15|b16|b23|b25|b26|b28}


With respect to a Tibetan spelling structure 18:


Tibetan spelling formal grammar G18: the spelling formal grammar G18 of the Tibetan superfixes, the roots, the vowel symbols and the suffixes is a quadruple (T18, V18, S18, P18), wherein:


(1) terminal symbol


T18=TB∪To, wherein:


TB={b1, b3, b4, b5, b7, b8, b9, b11, b12, b13, b15, b16, b17, b19, b23, b25, b26, b28, b29}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V18={S18, B18,1, B18,2, B18,3, B18,4, B18,5};


(3) S18 is a non-terminal symbol in V18 and is the start symbol; and


(4) the production set of the grammar G18 is: P18={


S18→b25B18,1|b26B18,2|b28B18,3,


B18,1→b1B18,5|b3B18,5|b4B18,5|b7B18,5|b8B18,5|b9B18,5|b11B18,5|b12B18,5|b15B18,5|b16B18,5|b17B18,5|b19B18,5,


B18,1→b1B18,4|b3B18,4|b4B18,4|b7B18,4|b8B18,4|b9B18,4|b11, B18,4|b12B18,4|b15B18,4|b16B18,4|b17B18,4|b19B18,4,


B18,2→b1B18,5|b3B18,5|b4B18,5|b5B18,5|b7B18,5|b9B18,5|b11B18,5|b13B18,5|b15B18,5|b29B18,5,


B18,2→b1B18,4|b3B18,4|b4B18,4|b5B18,4|b7B18,4|b9B18,4|b11B18,4|b13B18,4|b15B18,4|b29B18,4,


B18,3→b1B18,5|b3B18,5|b4, B18,5|b8B18,5|b9B18,5|b11B18,5|b12B18,5|b13B18,5|b15B18,5|b16B18,5|b17B18,5,


B18,3→b1B18,4|b3B18,4|b4B18,4|b8B18,4|b9B18,4|b11B18,4|b12B18,4|b13B18,4|b15B18,4|b16B18,4|b17B18,4,


B18,4→iB18,5|uB18,5|eB18,5|oB18,5,


B18,5→b3|b4|b11|b12|b15|b16|b23|b25|b26|b28}


With respect to a Tibetan spelling structure 19:


Tibetan spelling formal grammar G19: the spelling formal grammar G19 of the Tibetan roots, the subfixes, the vowel symbols and the suffixes is a quadruple (T6, V6, S6, P6), wherein:


(1) terminal symbol


T19=TB∪To, wherein:


TB={b1, b2, b3, b4, b8, b9, b10, b11, b12, b13, b14, b15, b16, b18, b20, b21, b22, b23, b24, b25, b26, b27, b28, b29}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V19={S19, B19,1, B19,2, B19,3, B19,4, B19,5, B19,6, B19,7, B19,8, B19,9, B19,10, B19,11};


(3) S19 is a non-terminal symbol in V19 and is the start symbol; and


(4) the production set of the grammar G19 is: P19={


S19→b1B19,1|b3B19,1,


S19→b2B19,2,


S19→b11B19,3|b29B19,3,


S19→b8B19,4|b18B19,4|b21B19,4|b26B19,4|b27B19,4,


S19→b9B19,5|b10B19,5,


S19→b13B19,6|b14B19,6|b16B19,6,


S19→b22B19,7|b25B19,7,


S19→b28B19,8,


S19→b15B19,9,


B19,1→b20B19,11|b24B19,11|b25B19,11|b26B19,11,


B19,1→b20B19,10|b24B19,10|b25B19,10|b26B19,10,


B19,2→b20B19,11|b24B19,11|b25B19,11,


B19,2→b20B19,10|b24B19,10|b25B19,10,


B19,3→b20B19,11|b25B19,11,


B19,3→b20B19,10|b25B19,10,


B19,4→b20B19,11,


B19,4→b20B19,10,


B19,5→b25B19,11,


B19,5→b25B19,10,


B19,6→b24B19,11|b25B19,11,


B19,6→b24B19,10|b25B19,10,


B19,7→b20B19,11|b26B19,11,


B19,7→b20B19,10|b26B19,10,


B19,8→b25B19,11|b26B19,11,


B19,8→b25B19,10|b26B19,10,


B19,9→b24B19,11|b25B19,11|b26B19,11,


B19,9→b24B19,10|b25B19,10|b26B19,10,


B19,10→iB19,11|uB19,11|eB19,11|oB19m,


B19,11→b3|b4|b11|b12|b15|b16|b23|b25|b26|b28}


With respect to a Tibetan spelling structure 20:


Tibetan spelling formal grammar G20: the spelling formal grammar G20 of the superfixes, the Tibetan roots, the subfixes, the vowel symbols and the suffixes is a quadruple (T20, V20, S20, P20), wherein:


(1) terminal symbol


T20=TB∪To, wherein:


TB={b1, b3, b4, b11, b12, b13, b15, b16, b17, b20, b23, b24, b25, b26, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V20={S20, B20,1, B20,2, B20,3, B20,4, B20,5, B20,6, B20,7, B20,8};


(3) S20 is a non-terminal symbol in V20 and is the start symbol; and


(4) the production set of the grammar G20 is: P20={


S20→b25B20,1,


S20→b28B20,2,


B20,1→b1B20,3|b3B20,3|b16B20,3,


B20,1→b17B20,4,


B20,2→b1B20,5|b3B20,5|b13B20,5|b15B20,5|b16B20,5,


B20,2→b12B20,6,


B20,3→b24B20,8,


B20,3→b24B20,7,


B20,4→b20B20,8,


B20,4→b20B20,7,


B20,5→b24B20,8|b25B20,8,


B20,5→b24B20,7|b25B20,7,


B20,6→b25B20,8,


B20,6→b25B20,7,


B20,7→iB20,8|uB20,8|eB20,8|oB20,8,


B20,8→b3|b4|b11|b12|b15|b16|b23|b25|b26|b28}


With respect to a Tibetan spelling structure 21:


Tibetan spelling formal grammar G21: the spelling formal grammar G21 of the Tibetan roots, the vowel symbols, the suffixes and the postfixes is a quadruple (T21, V21, S21, P21), wherein:


(1) terminal symbol


T21=TB∪To, wherein:


TB={b1, b2, b3, b4, b5, . . . , b30}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V21={S21, B21,1, B21,2, B21,3, B24,4, B21,5, B21,6, B21,7};


(3) S21 is a non-terminal symbol in V21 and is the start symbol; and


(4) the production set of the grammar G21 is: P21={


S21→b1B21,1|b2B21,1| . . . |b10B21,1|b12B21,1|b13B21,1| . . . |b22B21,1|b24B21,1|b25B21,1| . . . |b30B21,1,


S21→b11B21,2,


S21→b23B21,3,


B21,1→iB21,4|uB21,4|eB21,4|oB21,4,


B21,1→b3B21,7|b4B21,7|b15B21,7|b16B21,7,


B21,2→iB21,5|uB21,5|eB21,5|oB21,5,


B21,3→b4B21,7|b16B21,7,


B21,3→iB21,6|uB21,6|eB21,6|oB21,6,


B21,4→b3B21,7|b4B21,7|b15B21,7|b16B21,7,


B21,5→b3B21,7|b4B21,7|b15B21,7|b16B21,7,


B21,6→b3B21,7|b4B21,7|b15B21,7|b16B21,7,


B21,7→b28}


With respect to a Tibetan spelling structure 22:


Tibetan spelling formal grammar G22: the spelling formal grammar G22 of the Tibetan superfixes, the roots, the vowel symbols, the suffixes and the postfixes is a quadruple (T22, V22, S22, P22), wherein:


(1) terminal symbol


T22=TB∪To, wherein:


TB={b1, b3, b4, b5, b7, b8, b9, b11, b12, b13, b15, b16, b17, b19, b25, b26, b28, b29}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V22={S22, B22,1, B22,2, B22,3, B22,4, B22,5};


(3) S22 is a non-terminal symbol in V22 and is the start symbol; and


(4) the production set of the grammar G22 is: P22={


S22→b25B22,1|b26B22,2|b28B22,3,


B22,1→b1B22,4|b3B22,4|b4B22,4|b7B22,4|b8B22,4|b9B22,4|b11B22,4|b12B22,4|b15B22,4|b16B22,4|b17B22,4|b19B22,4,


B22,2→b1B22,4|b3B22,4|b4B22,4|b5B22,4|b7B22,4|b9B22,4|b11B22,4|b13B22,4|b15B22,4|b29B22,4,


B22,3→b1B22,4|b3B22,4|b4B22,4|b8B22,4|b9B22,4|b11B22,4|b12B22,4|b13B22,4|b15B22,4|b16B22,4|b17B22,4,


B22,4→B22,7|uB22,7|eB22,7|oB22,7,


B22,4→b12B22,5|b25B22,5|b26B22,5,


B22,4→b3B22,6|b4B22,6|b15B22,6|b16B22,6,


B22,7→b12B22,5|b25B22,5|b26B22,5,


B22,7→b3B22,6|b4B22,6|b15B22,6|b16B22,6,


B2,25→b11,


B2,26→b18}


With respect to a Tibetan spelling structure 23:


Tibetan spelling formal grammar G23: the Tibetan character spelling grammar G23 of the Tibetan roots, the subfixes, the vowel symbols, the suffixes and the postfixes is a quadruple (T23, V23, S23, P23), wherein:


(1) terminal symbol


T23=TB∪To, wherein:


TB{b1, b2, b3, b4, b8, b9, b10, b11, b12, b13, b14, b15, b16, b18, b20, b21, b22, b24, b25, b26, b27, b28, b29}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V23{S23, B23,1, B23,2, B23,3, B23,4, B23,5, B23,6, B23,7, B23,8, B23,9, B23,10, B23,11, B23,12, B23,13};


(3) S23 is a non-terminal symbol in V23 and is the start symbol; and


(4) the production set of the grammar G23 is: P23={


S23→b1B23,1|b3B23,1,


S23→b2B23,2,


S23→b11B23,3|b29B23,3,


S23→b8B23,4|b18B23,4|b21B23,4|b26B23,4|b27B23,4,


S23→b9B23,5|b10B23,5,


S23→b13B23,6|b14B23,6|b16B23,6,


S23→b22B23,7|b25B23,7,


S23→b28B23,8,


S23→b15B23,9,


B23,1→b20B23,10|b24|B23,10|b25B23,10|b26B23,10,


B23,2→b20B23,10|b24B23,10|b25B23,10,


B23,3→b20B23,10|b25B23,10,


B23,4→b20B23,10,


B23,5→b25B23,10,


B23,6→b24B23,10|b25B23,10,


B23,7→b20B23,10|b26B23,10,


B23,8→b25B23,10|b26B23,10,


B23,9→b24B23,10|b25B23,10|b26B23,10,


B23,10→iB23,11|uB23,11|eB23,11|oB23,11,


B23,10→b12B23,12|b25B23,12|b26B23,12,


B23,10→b3B23,13|b4B23,13|b15B23,13|b16B23,13,


B23,11→b12B23,12|b25B23,12|b26B23,12,


B23,11→b3B23,13|b4B23,13|b15B23,13|b16B23,13,


B23,12→b11,


B23,13|b18}


With respect to a Tibetan spelling structure 24:


Tibetan spelling formal grammar G24: the spelling formal grammar G24 of the Tibetan superfixes, the roots, the subfixes, the vowel symbols, the suffixes and the postfixes is a quadruple (T24, V24, S24, P24), wherein:


(1) terminal symbol


T24=TB∪To, wherein:


TB={b1, b3, b4, b11, b12, b13, b15, b16, b17, b20, b24, b25, b26, b28}, the elements thereof correspond to the Tibetan consonant characters; and To={i, u, e, o}, the elements thereof correspond to the Tibetan vowel characters;


(2) non-terminal symbol set


V24={S24, B24,1, B24,2, B24,3, B24,4, B24,5, B24,6, B24,7, B24,8, B24,9, B24,10};


(3) S24 is a non-terminal symbol in V24 and is the start symbol; and


(4) the production set of the grammar G24 is: P24={


S24→b25B24,1,


S24→b28B24,2,


B24,1→b1B24,3|b3B24,3|b16B24,3,


B24,1→b17B24,4,


B24,2→b1B24,5|b3B24,5|b13B24,5|b15B24,5|b16B24,5,


B24,2→b12B24,6,


B24,3→b24B24,7,


B24,4→b20B24,7,


B24,5→b24B24,7|b25B24,7,


B24,6→b25B24,7,


B24,7→iB24,8|uB24,8|eB24,8|oB24,8,


B24,7→b12B24,9|b25B24,9|b26B24,9,


B24,7→b3B24,10|b4B24,10|b15B24,10|b16B24,10,


B24,8→b12B24,9|b25B24,9|b26B24,9,


B24,8→b3B24,10|b4B24,10|b15B24,10|b16B24,10,


B24,9→b11,


B24,10→b18}


In the embodiment, the process of acquiring a newly added non-terminal symbol Ei includes: judging whether the finite set Pi of the production rules of the Tibetan spelling formal grammar Gi contains a production rule B→x, wherein BεVi and xεTi; and if so, acquiring Eiεδi (B, x), wherein δi (B, x)=φ. Ei belongs to one of the non-terminal symbols.


Step 103, the constituents of the Tibetan characters are acquired according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the Tibetan characters in the Tibetan text are correctly spelled.


In the embodiment, the process of determining the target finite state automaton through the step 103 can include: each finite state automaton in the finite state automaton group sequentially receives at least one Tibetan character from the initial state and transfers the state; if a certain finite state automaton in the finite state automaton group can enter the termination state after transferring the state, the Tibetan text to be checked is correctly spelled; if none of the finite state automata in the finite state automaton group can enter the termination state after transferring the state, the Tibetan text to be checked is wrongly spelled. The finite state automaton which determines that the Tibetan text to be checked is correctly spelled is the target finite state automaton.


Wherein, the operation of transferring the state can be as follows: the finite state automaton Mi receives a certain input character at a certain state, for example, qm (qmεQi), if x (xεΣi), if the state transition function δm (qm, x)εδi then the automaton enters the state qm+1 (qm+1ε(qm, x)), and otherwise, the state of the automaton is not changed.


In the embodiment, the process of acquiring the constituents of the Tibetan characters through the step 103 can include: at first, acquiring a target Tibetan spelling formal grammar corresponding to the target finite state automaton; and then, acquiring the constituents of the Tibetan characters according to the target Tibetan spelling formal grammar.


In the embodiment, the constituents of the Tibetan characters are in one-to-one correspondence with the Tibetan spelling formal grammars. Specifically, the constituents of the Tibetan characters have 24 basic spelling structures as follows:


Basic spelling structure 1 of the Tibetan characters: the Tibetan roots are spelled with the vowel symbols.


Basic spelling structure 2 of the Tibetan characters: the Tibetan superfixes, the roots and the vowels are spelled.


Basic spelling structure 3 of the Tibetan characters: the Tibetan roots, the subfixes and the vowel symbols are spelled.


Basic spelling structure 4 of the Tibetan characters: the superfixes, the Tibetan roots, the subfixes and the vowel symbols are spelled.


Basic spelling structure 5 of the Tibetan characters: the Tibetan prefixes, the superfixes, the roots and the vowel symbols are spelled.


Basic spelling structure 6 of the Tibetan characters: the Tibetan prefixes, the roots, the subfixes and the vowel symbols are spelled.


Basic spelling structure 7 of the Tibetan characters: the Tibetan prefixes, the superfixes, the roots, the subfixes and the vowel symbols are spelled.


Basic spelling structure 8 of the Tibetan characters: the Tibetan prefixes, the roots and the vowel symbols are spelled.


Basic spelling structure 9 of the Tibetan characters: the Tibetan prefixes, the roots, the vowel characters and the suffixes are spelled.


Basic spelling structure 10 of the Tibetan characters: the Tibetan prefixes, the superfixes, the roots, the vowel symbols and the suffixes are spelled.


Basic spelling structure 11 of the Tibetan characters: the Tibetan prefixes, the roots, the subfixes, the vowel symbols and the suffixes are spelled.


Basic spelling structure 12 of the Tibetan characters: the Tibetan prefixes, the superfixes, the roots, the subfixes, the vowel symbols and the suffixes are spelled.


Basic spelling structure 13 of the Tibetan characters: the Tibetan prefixes, the roots, the vowel symbols, the suffixes and the postfixes are spelled.


Basic spelling structure 14 of the Tibetan characters: the Tibetan prefixes, the superfixes, the roots, the vowel symbols, the suffixes and the postfixes are spelled.


Basic spelling structure 15 of the Tibetan characters: the Tibetan prefixes, the roots, the subfixes, the vowel symbols, the suffixes and the postfixes are spelled.


Basic spelling structure 16 of the Tibetan characters: the Tibetan prefixes, the superfixes, the roots, the subfixes, the vowel symbols, the suffixes and the postfixes are spelled.


Basic spelling structure 17 of the Tibetan characters: the Tibetan roots, the vowel symbols and the suffixes are spelled.


Basic spelling structure 18 of the Tibetan characters: the Tibetan superfixes, the roots, the vowel symbols and the suffixes are spelled.


Basic spelling structure 19 of the Tibetan characters: the Tibetan roots, the subfixes, the vowel symbols and the suffixes are spelled.


Basic spelling structure 20 of the Tibetan characters: the superfixes, the Tibetan roots, the subfixes, the vowel symbols and the suffixes are spelled.


Basic spelling structure 21 of the Tibetan characters: the Tibetan roots, the vowel symbols, the suffixes and the postfixes are spelled.


Basic spelling structure 22 of the Tibetan characters: the Tibetan superfixes, the roots, the vowel symbols, the suffixes and the postfixes are spelled.


Basic spelling structure 23 of the Tibetan characters: the Tibetan roots, the subfixes, the vowel symbols, the suffixes and the postfixes are spelled.


Basic spelling structure 24 of the Tibetan characters: the Tibetan superfixes, the roots, the subfixes, the vowel symbols, the suffixes and the postfixes are spelled.


It should be noted that the vowel symbols in the basic spelling structure 8 of the Tibetan characters are essential, and apart from this, the vowel symbols in the other structures are optional.


The present invention has the following beneficial effects: the Tibetan text to be analyzed is used as the input of the finite state automaton group, and the constituents of the Tibetan characters are acquired according to the target finite state automaton which determines that the Tibetan characters are correct, therefore Tibetan character constituent analysis is achieved, and Tibetan sorting can be further achieved according to the constituents of the Tibetan characters. As the finite state automaton group corresponds to the Tibetan spelling formal grammar, the technical solutions provided by the embodiments of the present invention solve the problem that the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of automatic computer Tibetan sorting.


Second Embodiment

As shown in FIG. 2, the embodiment of the present invention provides a Tibetan sorting method, including:


step 201, at least two Tibetan characters to be sorted are acquired.


In the embodiment, the at least two Tibetan characters acquired in the step 201 can be independent Tibetan characters and can also be a Tibetan text composed of a plurality of Tibetan characters, and this is not limited herein. Particularly, when the Tibetan text of at least two Tibetan characters is acquired, the Tibetan text can be segmented at first, the segmentation process is similar to the segmentation mode in the step 101 as shown in FIG. 1, and thus will not be repeated redundantly herein.


Step 202, the at least two Tibetan characters to be sorted are respectively used as the input of a preset finite state automaton group.


Step 203, the constituents of the Tibetan characters are acquired according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the input Tibetan characters are correctly spelled.


In the embodiment, the process of acquiring the constituents of the Tibetan characters in the step 202 and the step 203 is similar to that in the step 102 and the step 103 as shown in FIG. 1, and thus will not be repeated redundantly herein.


Step 204, the at least two Tibetan characters are sorted according to the constituents of the at least two Tibetan characters to acquire a sorting result.


In the embodiment, for any two Tibetan characters in the at least two Tibetan characters, the sorting process in the step 204 includes: 2041, judging whether the two Tibetan characters conform to a preset constituent rule according to the constituents of the two Tibetan characters; if so, executing 2042; otherwise, executing 2044; 2042, judging whether the roots of the two Tibetan characters are the same; if so, executing 2043; otherwise, executing 2044; 2043, sequentially comparing the constituents of the two Tibetan characters according to the sequence of prefixes, superfixes, subfixes, vowels, suffixes and postfixes; executing 2045; 2044, sequentially comparing the constituents of the two Tibetan characters according to the sequence of superfixes, prefixes, subfixes, vowels, suffixes and postfixes; executing 2045; and 2045, if the comparison result is that the former Tibetan character in the two Tibetan characters is larger than the latter Tibetan character, exchanging the sequence of the two Tibetan characters; and otherwise, keeping the sequence of the two Tibetan characters unchanged. Wherein, 2041 includes: acquiring spelling structure serial numbers of the two Tibetan characters according to the constituents of the two Tibetan characters; and judging whether the two Tibetan characters conform to the preset constituent rule according to the spelling structure serial numbers of the two Tibetan characters, wherein the constituent rule includes: the spelling structure serial number of the first Tibetan character in the two Tibetan characters belongs to a set {2, 4, 18, 20, 22, 24}, and the spelling structure serial number of the second Tibetan character in the two Tibetan characters belongs to a set {5, 7, 10, 12, 14, 16}; or, the spelling structure serial number of the first Tibetan character in the two Tibetan characters belongs to the set {5, 7, 10, 12, 14, 16}, and the spelling structure serial number of the second Tibetan character in the two Tibetan characters belongs to the set {2, 4, 18, 20, 22, 24}.


In the embodiment, the constituents of the Tibetan character can be summarized as including the following 7 symbols: the root, the prefix, the superfix, the subfix, the vowel, the suffix and the postfix. When the constituents of the Tibetan character do not contain one or several certain symbols, the corresponding symbol mark of the Tibetan character is 0.


In the embodiment, after the any two Tibetan characters in the at least two Tibetan characters are sorted via the above process, all of the at least two Tibetan characters can be sorted by adopting a bubble algorithm and other sorting methods.


The present invention has the following beneficial effects: the Tibetan text to be analyzed is used as the input of the finite state automaton group, and the constituents of the Tibetan characters are acquired according to the target finite state automaton which determines that the Tibetan characters are correct, therefore Tibetan character constituent analysis is achieved, and Tibetan sorting can be further achieved according to the constituents of the Tibetan characters. As the finite state automaton group corresponds to the Tibetan spelling formal grammar, the technical solutions provided by the embodiments of the present invention solve the problem that the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of automatic computer Tibetan sorting.


Third Embodiment

As shown in FIG. 3, the embodiment of the present invention provides a Tibetan sorting method, including:


step 301, at least two Tibetan words to be sorted are acquired.


Step 302, Tibetan characters in the at least two Tibetan words are respectively acquired.


In the embodiment, the at least two Tibetan words can be segmented to acquire the Tibetan characters; and the at least two Tibetan words can be divided according to a specific separator and other signs to acquire the Tibetan characters, which will not be repeated redundantly herein.


S303, the Tibetan characters in the at least two Tibetan words are respectively used as the input of a preset finite state automaton group.


Step 304, the constituents of the Tibetan characters are acquired according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the input Tibetan characters are correctly spelled.


In the embodiment, the process of acquiring the constituents of the Tibetan characters in the step 303 and the step 304 is similar to that in the step 102 and the step 103 as shown in FIG. 1, and thus will not be repeated redundantly herein.


Step 305, the at least two Tibetan words are sorted according to the constituents of the each Tibetan character in the at least two Tibetan words to acquire a sorting result.


In the embodiment, for any two Tibetan words in the at least two Tibetan words, the sorting process in the step 305 includes: 3051, respectively acquiring first Tibetan characters in the two Tibetan words; 3052, judging whether the two Tibetan characters conform to a preset constituent rule according to the constituents of the Tibetan characters; if so, executing 3053; otherwise, executing 3055; 3053, judging whether the roots of the Tibetan characters are the same; if so, executing 3054; otherwise, executing 3055; 3504, sequentially comparing the constituents of the Tibetan characters according to the sequence of prefixes, superfixes, subfixes, vowels, suffixes and postfixes; executing 3056; 3055, sequentially comparing the constituents of the Tibetan characters according to the sequence of superfixes, prefixes, subfixes, vowels, suffixes and postfixes; executing 3056; and 3056, if the comparison result is that the Tibetan characters in the former Tibetan word are larger than the corresponding Tibetan characters in the latter Tibetan word, exchanging the sequence of the two Tibetan words; if the comparison result is that the Tibetan characters in the former Tibetan word are smaller than the corresponding Tibetan characters in the latter Tibetan word, keeping the sequence of the two Tibetan words unchanged; and if the comparison result is that the Tibetan characters in the former Tibetan word are equal to the corresponding Tibetan characters in the latter Tibetan word, acquiring the next Tibetan characters in the at least two Tibetan words, and executing 3052 to 3056 until all the Tibetan characters in the two Tibetan words are completely compared. Wherein, the process of judging whether the judging whether the two Tibetan characters conform to the constituent rule in 3052 is similar to that provided in the second embodiment, and thus will not be repeated redundantly herein.


The present invention has the following beneficial effects: the Tibetan text to be analyzed is used as the input of the finite state automaton group, and the constituents of the Tibetan characters are acquired according to the target finite state automaton which determines that the Tibetan characters are correct, therefore Tibetan character constituent analysis is achieved, and Tibetan sorting can be further achieved according to the constituents of the Tibetan characters. As the finite state automaton group corresponds to the Tibetan spelling formal grammar, the technical solutions provided by the embodiments of the present invention solve the problem that the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of automatic computer Tibetan sorting.


Fourth Embodiment

As shown in FIG. 4, the embodiment of the present invention provides a Tibetan character constituent analysis device, including:


a text acquisition module 401, used for acquiring a Tibetan text to be analyzed;


a text input module 402, connected with the text acquisition module and used for using Tibetan characters in the Tibetan text as the input of a preset finite state automaton group; and


a constituent analysis module 403, connected with the text input module and used for acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the Tibetan characters in the Tibetan text are correctly spelled;


the finite state automaton group includes 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qii of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi and FiQi; and the custom-character is a positive integer, and custom-character≦24.


In the embodiment, the process of implementing Tibetan character constituent analysis through the text acquisition module 401, the text input module 402 and the constituent analysis module 403 is similar to the process provided by the first embodiment of the present invention, and thus will not be repeated redundantly herein.


The present invention has the following beneficial effects: the Tibetan text to be analyzed is used as the input of the finite state automaton group, and the constituents of the Tibetan characters are acquired according to the target finite state automaton which determines that the Tibetan characters are correct, therefore Tibetan character constituent analysis is achieved, and Tibetan sorting can be further achieved according to the constituents of the Tibetan characters. As the finite state automaton group corresponds to the Tibetan spelling formal grammar, the technical solutions provided by the embodiments of the present invention solve the problem that the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of automatic computer Tibetan sorting.


Fifth Embodiment

As shown in FIG. 5, the embodiment of the present invention provides a Tibetan sorting device, including:


a Tibetan character acquisition module 501, used for acquiring at least two Tibetan characters to be sorted;


a Tibetan character input module 502, connected with the Tibetan character acquisition module and used for respectively using the at least two Tibetan characters to be sorted as the input of a preset finite state automaton group;


a constituent analysis module 503, connected with the Tibetan character input module and used for acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the input Tibetan characters are correctly spelled; and


a sorting module 504, connected with the constituent analysis module and used for sorting the at least two Tibetan characters according to the constituents of the at least two Tibetan characters to acquire a sorting result;


the finite state automaton group includes 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi): the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qii of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi, and FiQi; and the custom-character is a positive integer, and custom-character≦24.


In the embodiment, the process of implementing Tibetan sorting through the Tibetan character acquisition module 501, the Tibetan character input module 502, the constituent analysis module 503 and the sorting module 504 is similar to the process provided by the second embodiment of the present invention, and thus will not be repeated redundantly herein.


The present invention has the following beneficial effects: the Tibetan text to be analyzed is used as the input of the finite state automaton group, and the constituents of the Tibetan characters are acquired according to the target finite state automaton which determines that the Tibetan characters are correct, therefore Tibetan character constituent analysis is achieved, and Tibetan sorting can be further achieved according to the constituents of the Tibetan characters. As the finite state automaton group corresponds to the Tibetan spelling formal grammar, the technical solutions provided by the embodiments of the present invention solve the problem that the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of automatic computer Tibetan sorting.


Sixth Embodiment

As shown in FIG. 6, the embodiment of the present invention provides a Tibetan sorting device, including:


a Tibetan word acquisition module 601, used for acquiring at least two Tibetan words to be sorted;


a Tibetan character acquisition module 602, connected with the Tibetan word acquisition module and used for respectively acquiring Tibetan characters in the at least two Tibetan words;


a Tibetan character input module 603, connected with the Tibetan character acquisition module and used for respectively using the Tibetan characters in the at least two Tibetan words as the input of a preset finite state automaton group;


a constituent analysis module 604, connected with the Tibetan character input module and used for acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the input Tibetan characters are correctly spelled; and


a sorting module 605, connected with the constituent analysis module and used for sorting the at least two Tibetan words according to the constituents of the each Tibetan character in the at least two Tibetan words to acquire a sorting result;


the finite state automaton group includes 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi; and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qii of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi, and FiQi; and the custom-character is a positive integer, and custom-character≦24.


In the embodiment, the process of implementing Tibetan sorting through the Tibetan word acquisition module 601 to the sorting module 605 is similar to the process provided by the third embodiment of the present invention, and thus will not be repeated redundantly herein.


The present invention has the following beneficial effects: the Tibetan text to be analyzed is used as the input of the finite state automaton group, and the constituents of the Tibetan characters are acquired according to the target finite state automaton which determines that the Tibetan characters are correct, therefore Tibetan character constituent analysis is achieved, and Tibetan sorting can be further achieved according to the constituents of the Tibetan characters. As the finite state automaton group corresponds to the Tibetan spelling formal grammar, the technical solutions provided by the embodiments of the present invention solve the problem that the existing Tibetan sorting methods have no universality or compatibility, which is inconvenient for the use of automatic computer Tibetan sorting.


The order of the above embodiments is only for the purpose of convenient description, and does not represent the advantages and disadvantages of the embodiments.


Finally, it should be noted that the above embodiments are merely used for illustrating the technical solutions of the present invention, rather than limiting them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they could still make modifications to the technical solutions recorded in the foregoing embodiments or make equivalent substitutions to a part of technical features therein; and these modifications or substitutions do not make the essence of the corresponding technical solutions depart from the spirit and the scope of the technical solutions of the embodiments of the present invention.

Claims
  • 1. A Tibetan character constituent analysis method, comprising: S10, acquiring a Tibetan text to be analyzed;S20, using Tibetan characters in the Tibetan text as the input of a preset finite state automaton group; andS30, acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the Tibetan characters in the Tibetan text are correctly spelled;the finite state automaton group comprises 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qi*Σi of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi, and Fi⊂Qi; and the is a positive integer, and ≦24.
  • 2. The Tibetan character constituent analysis method of claim 1, wherein the step S30 comprises: S301, acquiring a target Tibetan spelling formal grammar corresponding to the target finite state automaton; andS302, acquiring the constituents of the Tibetan characters according to the target Tibetan spelling formal grammar.
  • 3. A Tibetan sorting method, comprising: S10, acquiring at least two Tibetan characters to be sorted;S20, respectively using the at least two Tibetan characters to be sorted as the input of a preset finite state automaton group;S30, acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the input Tibetan characters are correctly spelled; andS40, sorting the at least two Tibetan characters according to the constituents of the at least two Tibetan characters to acquire a sorting result;the finite state automaton group comprises 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qi*Σi of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi and Fi⊂Qi; and the is a positive integer, and ≦24.
  • 4. The Tibetan sorting method of claim 3, wherein for any two Tibetan characters in the at least two Tibetan characters, the step S40 comprises: S401, judging whether the two Tibetan characters conform to a preset constituent rule according to the constituents of the two Tibetan characters; if so, executing S402; otherwise, executing S404;S402, judging whether the roots of the two Tibetan characters are the same; if so, executing S403; otherwise, executing S404;S403, sequentially comparing the constituents of the two Tibetan characters according to the sequence of prefixes, superfixes, subfixes, vowels, suffixes and postfixes; executing S405;S404, sequentially comparing the constituents of the two Tibetan characters according to the sequence of superfixes, prefixes, subfixes, vowels, suffixes and postfixes; executing S405; andS405, if the comparison result is that the former Tibetan character in the two Tibetan characters is larger than the latter Tibetan character, exchanging the sequence of the two Tibetan characters; and otherwise, keeping the sequence of the two Tibetan characters unchanged.
  • 5. The Tibetan sorting method of claim 4, wherein the 401 comprises: S4011, acquiring spelling structure serial numbers of the two Tibetan characters according to the constituents of the two Tibetan characters; andS4012, judging whether the two Tibetan characters conform to the preset constituent rule according to the spelling structure serial numbers of the two Tibetan characters;the constituent rule comprises:the spelling structure serial number of the first Tibetan character in the two Tibetan characters belongs to a set {2, 4, 18, 20, 22, 24}, and the spelling structure serial number of the second Tibetan character in the two Tibetan characters belongs to a set {5, 7, 10, 12, 14, 16}; or, the spelling structure serial number of the first Tibetan character in the two Tibetan characters belongs to the set {5, 7, 10, 12, 14, 16}, and the spelling structure serial number of the second Tibetan character in the two Tibetan characters belongs to the set {2, 4, 18, 20, 22, 24}.
  • 6. A Tibetan sorting method, comprising: S10, acquiring at least two Tibetan words to be sorted;S20, respectively acquiring Tibetan characters in the at least two Tibetan words;S30, respectively using the Tibetan characters in the at least two Tibetan words as the input of a preset finite state automaton group;S40, acquiring the constituents of the Tibetan characters according to a target finite state automaton, when the target finite state automaton in the finite state automaton group determines that the input Tibetan characters are correctly spelled; andS50, sorting the at least two Tibetan words according to the constituents of the each Tibetan character in the at least two Tibetan words to acquire a sorting result;the finite state automaton group comprises 24 finite state automata, and any finite state automaton Mi=(Σi, Qi, δi, qi, Fi); the Σi represents a finite set of terminal symbols of a preset Tibetan spelling formal grammar Gi; the Qi represents a union of a finite set Vi of non-terminal symbols of the Tibetan spelling formal grammar Gi and the Fi; the δi represents a state transition function of the finite state automaton Mi acquired by mapping from a direct product Qi*Σi of Qi and Σi to Qi; the qi represents an initial state of the finite state automaton Mi; qiεQi; the Fi represents a finite set of termination states of the finite state automaton Mi, and Fi⊂Qi; and the is a positive integer, and ≦24.
  • 7. The Tibetan sorting method of claim 6, wherein for any two Tibetan words in the at least two Tibetan words, the step S50 comprises: S501, respectively acquiring first Tibetan characters in the two Tibetan words;S502, judging whether the two Tibetan characters conform to a preset constituent rule according to the constituents of the Tibetan characters; if so, executing S503; otherwise, executing S505;S503, judging whether the roots of the Tibetan characters are the same; if so, S504; otherwise, executing S505;S504, sequentially comparing the constituents of the Tibetan characters according to the sequence of prefixes, superfixes, subfixes, vowels, suffixes and postfixes; executing S506;S505, sequentially comparing the constituents of the Tibetan characters according to the sequence of superfixes, prefixes, subfixes, vowels, suffixes and postfixes; executing S506; andS506, if the comparison result is that the Tibetan characters in the former Tibetan word are larger than the corresponding Tibetan characters in the latter Tibetan word, exchanging the sequence of the two Tibetan words; if the comparison result is that the Tibetan characters in the former Tibetan word are smaller than the corresponding Tibetan characters in the latter Tibetan word, keeping the sequence of the two Tibetan words unchanged; and if the comparison result is that the Tibetan characters in the former Tibetan word are equal to the corresponding Tibetan characters in the latter Tibetan word, acquiring the next Tibetan characters in the at least two Tibetan words, and executing S502 to S506 until all the Tibetan characters in the two Tibetan words are completely compared.
Priority Claims (1)
Number Date Country Kind
201610528753.9 Jul 2016 CN national