This application claims priority to Taiwan Application Serial Number 102123486, filed Jul. 1, 2013, which is herein incorporated by reference.
1. Technical Field
The present invention relates to a computer educating tool. More particularly, the present invention relates to a text scoring system.
2. Description of Related Art
With the rapid development of high tech and the Internet has become popular, in addition to the office and social, school only has popularize online education for providing students a new type of study. Thus, students can not only receive more and immediate information through the Internet, but also increase the efficiency of study by distance learning and hand in the home work online.
In general, students are asked to write a substance and an abstract as homework after reading an article, and teacher can check whether students understand the content of the article through the homework. However, when one teacher has to teach many students, the quantity of the homework is also excessive, and the teacher cannot detailed teach and give some precise suggestion to every student. Besides, it is difficult for teacher to provide an abstract or key point of every article for student.
According to one aspect of the present disclosure, a text abstract scoring system includes a text providing module, a text dividing module, a searching module, a choosing module, a receiving module and a comparing module. The text providing module is for providing an original text. The text dividing module is for dividing the original text into a plurality of terms. The searching module is connected to a first external database, and the searching module includes a first searching module for searching a plurality of first related terms from the first external database according to each of the terms. The choosing module is connected to the searching module, and the choosing module includes a first calculating module for calculating a first related degree between each of the first related terms and each of the terms of the original text, wherein the first related terms corresponding to a maximum first related degree is a substance of the original text. The receiving module is for receiving at least one user's text. The comparing module is for checking whether the user's text has the substance, and providing a comparing result.
According to another aspect of the present disclosure, a text abstract scoring method includes, providing an original text, wherein the original text has a plurality of paragraphs; dividing the original text into a plurality of terms; searching a plurality of first related terms from the first external database according to each of the terms, and calculating a first related degree between each of the first related terms and each of the terms of the original text, wherein the first related terms corresponding to a maximum first related degree is a substance of the original text; searching a plurality of second related terms from the first external database according to the substance, and calculating a second related degree between each of the second related terms and the terms of each of the paragraphs, wherein in each paragraph the second related term corresponding to a maximum second related degree is a paragraph substance of the paragraph; choosing the terms from the first related terms and the second related terms except the substance and the paragraph substances as a plurality of paragraph related terms; receiving at least one user's text; and comparing whether the user's text has the substance, the paragraph substances and the paragraph related terms, and providing a comparing result.
According to another aspect of the present disclosure, a text abstract editing system includes a text providing module, a text dividing module, a first searching module, a first calculating module, a second searching module, a second calculating module, a sentence choosing module, and an abstract editing module. The text providing module is for providing an original text, wherein the original text has a plurality of paragraphs. The text dividing module is for dividing the original text into a plurality of terms. The first searching module is for searching a plurality of first related terms from the first external database according to each of the terms. The first calculating module is for calculating a first related degree between each of the first related terms and each of the terms of the original text, wherein the first related terms corresponding to a maximum first related degree is a substance of the original text. The second searching module is for searching a plurality of second related terms from the first external database according to the substance. The second calculating module is for calculating a second related degree between each of the second related terms and the terms of each of the paragraphs, wherein in each paragraph the second related term corresponding to a maximum second related degree is a paragraph substance of the paragraph. The sentence choosing module is for calculating a sentence related degree between each of a plurality of sentences of each paragraph and the paragraph substance thereof, wherein the sentence corresponding to a maximum sentence related degree is a main sentence of each paragraph. The abstract editing module is for composing the main sentence of each paragraph into an abstract, wherein a sequence of each main sentence in the abstract corresponds to a sequence of the paragraphs in the original text, and a title of the abstract is the substance.
The invention can be more fully understood by reading the following detailed description of the embodiment, with reference made to the accompanying drawings as follows:
In
In detail, the text providing module 110 is for providing an original text. The original text can be a text in English, which can have a plurality of paragraphs. The user can read the original text.
The text dividing module 120 is for dividing the original text into a plurality of terms. For dividing the original text into exactly terms, the text dividing module 120 can further includes a term identification module 121, a tokenization module 122 and a stemming module 123. The term identification module 121, such as part-of-speech Noun identification, is for identifying a language of the original text. The tokenization module 122 is for dividing character streams of the original text into a plurality of pre-terms and classifying the pre-terms. The stemming module 123 is for certainly dividing the pre-terms into the terms of the original text. For an example, the term identification module 121 can be connected to an external code source 300, such as LingPipe, FreeLing, openNLP, etc, so that the term identification module 121 can certainly identifying the language of the original text. The tokenization module 122 and the stemming module 123 can be connected to a second external database 400, wherein the second external database 400 can be a lexical database, such as WorldNet which include a great quantity of definition of words and phrases, superordinate relation or subordinate relation (also called hyperonymy relation or hyponymy relation), relative words, etc. Therefore, the tokenization module 122 can divide character streams of the original text into a plurality of pre-terms and classify the pre-terms according to the second external database 400, and the stemming module 123 can certainly divide the pre-terms into the terms of the original text.
The following statement is part of paragraph of an original text, which is certainly classified and divided into a plurality of terms via the term identification module 121 the tokenization module 122 and the stemming module 123 of the text dividing module 120.
Furthermore, the text dividing module 120 can also recognize that different terms in the original text are the same entity. For an example, a paragraph of one original text is stated that,
The searching module 130 connected to the first external database 200 includes a first searching module 131, wherein the first searching module is for searching a plurality of first related terms from the first external database 200 according to each of the terms. In detail, the first external database 200 can be chose from a variety of databases on demand, such as Wikipedia, and the first related terms related to each terms can be searched and obtained from the first external database 200. Furthermore, the first searching module 131 can be connected to the first external database 200 via a connecting platform, such as Yahoo. Query Language (YQL), etc., which can select the terms searched from the first searching module 131, so that the quantity of the first related terms would not be excessive and the first related terms can be snore precise.
The choosing module 140 includes a first calculating module 141 for calculating a first related degree between each of the first related terms and each of the terms of the original text, wherein the first related terms corresponding to a maximum first related degree is a substance of the original text. In detail, the calculating conditions (1) and (2) for obtaining the first related degrees are stated as follow:
tfi is times of appearance of a first related terms in the i-th paragraph of the original text; and
is times of appearance of aforementioned first related terms in others paragraph except of the i-th paragraph,
Each of the first related terms can be a word or a phrase, hence, the foregoing calculating condition (1) can provide the importance of a word to the original text, and the foregoing calculating condition (2) can provide the importance of a phrase to the original text. In general, the phrase is more meaningful than the word, so that the weight of the phrase in the foregoing calculating condition (2) is larger than the weight of the word in the foregoing calculating condition (1).
Moreover, the first calculating module 141 can calculate a first related degree between each of the first related terms and each of the terms of the original text via the calculating condition (3) as follows.
Ti represents whether the first related term in the title of the original text. When the first related term is in the title of the original text, Ti is 1; when the first related term is not in the title of the original text, Ti is 0;
TFi represents the number of times the first related term appears in the original text; and
ParaNum represents a number of the paragraphs in the original text.
The foregoing calculating condition (3) provides the first related degree between each of the first related terms and each of the terms of the original text, wherein the first related terms corresponding to a maximum first related degree is a substance of the original text.
The receiving module 150 is for receiving at least one user's text. In this embodiment of the present disclosure, the user can upload a reading review to the text abstract scoring system after reading the original text.
The comparing module 160 is for checking whether the user's text has the substance provided from the first calculating module 141 of the choosing module 140, and providing a comparing result. That is, the comparing module 160 can provide an answer whether the user's text has the substance. From the teaching perspective, the user catches the key point of the original text when the user's text has the substance. Otherwise, if the user's text does not have the substance, the user misunderstand the meaning of the original text. Therefore, Teacher can adjust the way of teaching in order to improve the ability of study, comprehension and writing of the user (Student).
In detail, the second searching module 132 is for searching a plurality of second related terms from the first external database 200 according to the substance, wherein the searching processes of the second searching nodule 132 and the first searching module 131 in the embodiment of
The second calculating module 142 of the choosing module 140 is for calculating a second related degree between each of the second related terms and the terms of each of the paragraphs, wherein in each paragraph the second related term corresponding to a maximum second related degree is a paragraph substance of the paragraph.
The second calculating module 142 can obtain the paragraph substance of each paragraph via the calculating condition (4) as follow.
PFij represents times of appearance of the second related term j in the i-th paragraph;
TFj represents times of appearance of the second related term j in the original text;
OPFij represents times of appearance of the second related term j in the original text except the aforementioned i-th paragraph
Pj represents times of appearance of the second related term j in different paragraphs, for an example, when the times of appearance of the second related term j in the 1st paragraph is 2, the times of appearance of the second related term j in the 2nd paragraph is 1, the times of appearance of the second related term j in the 3rd paragraph is 0, then Pj=2 (the second related term j appear in the 1st and 2nd paragraph;
DCj represents whether the second related term j is one of the related terms of the original text; when the answer is yes, DCj=1; when the answer is no, DCj=0; and
PCij represents whether the second related term is one of the related terms of the paragraphs; when the answer is yes, PCij=1; when the answer no, PCij=0.
The aforementioned related terms of the original text and related terms of the paragraphs is searched from the words and sentenced of the original text, and is certainly divided into related terms from the text dividing module 120.
From the aforementioned calculating condition (4), the second related term corresponding to the maximum of termij is the paragraph substance of the paragraph.
After obtain the paragraph substance of each paragraph, the comparing module 160 can checking whether each of the user's paragraphs of the user's text has the paragraph substance, and providing a paragraph comparing result.
Furthermore, the third searching module 133 of the searching module 130 is for receiving the first related terms from the first searching module 131 and the second related terms from the second searching module 132.
The third calculating module 143 of the choosing module 140 is for choosing the terms from the first related terms and the second related terms except the substance and the paragraph substances as a plurality of paragraph related terms, which can be regarded as supporting ideas.
The comparing module 160 can further check whether each of the user's paragraphs has the paragraph related terms, and provide a paragraph related terms comparing result.
In other words, the first calculating module 141 choose the terms corresponding to a maximum first related degree as the substance based on the original text. The second calculating module 142 choose the second related term corresponding to a maximum second related degree as the paragraph substance based on each paragraph of the original text. The third calculating module 143 provides the paragraph related terms according to the results of the first searching module 131 the first calculating module 141, the second searching module 132 and the second calculating module 142.
According to the comparing result, the paragraph comparing result and the paragraph related terms comparing result, the score calculating module 170 of the text abstract scoring system 100 can receive the comparing result, the paragraph comparing result and the paragraph related terms comparing result, and calculate a user's text score. In general, the percentage of the score of the substance and the paragraph substances are greater than the paragraph related terms. Therefore. Teacher can confirm whether the user understand the content of the original text from the score.
Moreover, the mind map module 180 of the text abstract scoring system 100 is for receiving the substance, the paragraph substance and the paragraph related terms, and providing a term mind map.
Furthermore, the text abstract scoring system 100 can include a lexical chain module 190 connected to the choosing module 140. The lexical chain module 190 is for receiving the paragraph related terms from the third calculating module 143 of the choosing module 140. The lexical chain module 190 is connected to a third external database 410, such as WordNet, which can compare that the related degree between each paragraph related term and each term of the paragraph of the original text for classifying each of the paragraph related terms. In detail, the lexical chain module 190 can classify the paragraph related terms according four types (called lexical chain types), and importance criterion of each type is presented as the number of sign “★” which shown as following table.
The lexical chain module 190 can search whether the paragraph related terms of the four types exist in the paragraph or not. If the paragraph related term of the four types appears many times in the paragraph, the paragraph related term is considered more important. In the lexical chain module 190, the importance can be quantized into chain score as following calculating condition (5):
chain_score(ti)=ns(ti)*1+[nh(ti)+nr(ti)]*0.7+nm(ti)*0.4 (5)
wherein,
ns represents the number of times that the synonyms reiterations of one paragraph related term appears in the paragraph;
nh represents the number of times that the hypernyms/hyponyms of one paragraph related term appears in the paragraph;
nr represents the number of times that the relatedsynsets (related synonym) of one paragraph related term appears in the paragraph;
nm represents the number of times that the meronyms of one paragraph related term appears in the paragraph.
If the chain score of one paragraph related term is higher, that is, the paragraph related term appears lots of synonyms, hypernyms/hyponyms, related synonym and meronyms in a paragraph, the paragraph related term has a major contribution for the meaning of the one paragraph.
Step 500 an original text is provided, wherein the original text has a plurality of paragraphs.
Step 510, the original text is divided into a plurality of terms.
Step 520, a plurality of first related terms is searched from the first external database according to each of the terms, and a first related degree between each of the first related terms and each of the terms of the original text is calculated, wherein the first related terms corresponding to a maximum first related degree is a substance of the original text. That is, the substance is provided.
Step 530, a plurality of second related terms is searched from the first external database according to the substance, and a second related degree between each of the second related terms and the terms of each of the paragraphs is calculated, wherein in each paragraph the second related term corresponding to a maximum second related degree is a paragraph substance of the paragraph. That is, the paragraph substances are provided.
Step 540, the terms from the first related terms and the second related terms except the substance and the paragraph substances are chose as a plurality of paragraph related terms. That is, the paragraph related terms are provided.
Step 550, at least one user's text is received.
Step 560, the user's text is compared to the substance, the paragraph substances and the paragraph related terms for checking whether the user's text has the substance, the paragraph substances and the paragraph related terms, and a comparing result is provided.
Furthermore, the text abstract scoring method can further include Step 570 the substance, the paragraph substances and the paragraph related terms can be received, and a term mind map can be provided (as shown in
Therefore, the comparing result can show the level of comprehension and writing of the user. Further, the term mind map can simplified and certainly present the key point of the original text, so that the user can quickly and exactly understand the point of the original text from the term mind map.
The text providing module 610 is for providing an original text, wherein the original text has a plurality of paragraphs, and each paragraph has a plurality of sentences. The original text can be a text document built-in the text abstract editing system 600 or a text captured from external system or internet which connected with the text abstract editing system 600.
The text dividing module 620 is connected to the text providing module 610, and is for dividing the original text into a plurality of terms. The text dividing module 620 can includes a term identification module 621, tokenization module 622 and a stemming module 623, which are the same to the term identification module 121, the tokenization module 122 and the stemming module 123 in
In
The second searching module 650 is for searching a plurality of second related terms from the first external database 690 according to the substance, wherein the second searching module 650 is connected to the first calculating module 640. Then, the second calculating module 660 connected to the second searching module 650 is for calculating a second related degree between each of the second related terms and the terms of each of the paragraphs, wherein in each paragraph the second related term corresponding to a maximum second related degree is a paragraph substance of the paragraph. The detail of the first searching module 630, the first calculating module 640, the second searching module 650 and the second calculating module 660 is the same with the embodiments of
In the embodiment of
Furthermore, the abstract editing module 680 connected to the sentence choosing module 670 is for composing the main sentence of each paragraph into an abstract, wherein a sequence of each main sentence in the abstract corresponds to a sequence of the paragraphs in the original text, and a title of the abstract is the substance. Therefore, the text abstract editing system can provide the abstract of the original text according to the main sentences of the paragraph and the sequence of each paragraph of the original text. Therefore, the user can read the abstract provided from the text abstract editing system 600 for understanding the main idea of the original text.
Moreover, in one paragraph or sentence, one term may associates with many verbs, called verb argument, and errors would be easily occurred during dividing the original text. Hence, the PropBank notations (Kingsbury & Palmer, 2003) can be applied to the text abstract editing system of the present disclosure for enhancing the accuracy thereof.
In the application of the PropBank notations, the terms in the composed abstract may occurred verb arguments can be labelled. For an example, a sentence of fifth paragraph in the article of “Finding His Calling in His 70s: Calligrapher Wang Zhongtian” is stated as follow:
The verb arguments in the paragraph can be listed as follow, and “V” means the term with verb argument:
Then, the paragraph can be checked that whether the term exist in the verb argument structures or not. Also, the term with verb argument in the paragraph can also be quantized by a calculating condition which for presenting a conceptual term frequency in paragraph (ctfp) as follow:
ctf represents the number of times that term appears in the verb argument structures;
pn represents the number of sentences that contain the concept c (can be considered as the substance, paragraph substance or paragraph related terms) in a paragraph p.
If the frequency of the concept appears in verb argument structures of sentences in a paragraph is high, the concepts have a major contribution for the meaning of the one paragraph. The ctfp value of the example is listed as follow:
In the text abstract scoring system, the text abstract scoring method and the text abstract editing system of the present disclosure, many terms, phrases would be searched and provided, however, especially about the noun phrases, there are more information than simple nouns in text. For identifying the noun phrase, named entity recognition (NER) can be applied to the text abstract scoring system, the text abstract scoring method and the text abstract editing system of the present disclosure. NER is the information extraction task in Natural Language Processing (NLP). It aims to identify and classify mentions of people, organizations, locations, time, money and other named entities within text (Nothman, Ringland, Radford, Murphy, & Curran, 2013). Therefore, the phrases in the original text can be identified based on NER.
First, the n-gram is provided to extract the phrase combination. Second, NER tagger can be provided for searching the entities in the paragraph. The entities such as people, organizations and locations are considered very important in the text, so the importance of entities can be presented as scare and be calculated by the following condition:
Score(pj)=Σt
wherein,
pj represents the j-th phrase in the set of the phrases extracted by the regular expressions method and the phrases extracted by the NER method;
fj represents the j-th phrase which comes from the subset by the regular expressions method;
ej represents the j-th phrase which comes from the subset by the NER method; and
w(ti) represents the term weight which comes from the Automatic Semantic labeling and Lexical Chains.
Therefore, the phrase with higher score can be considered as exemplar phrase in each paragraph.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fail within the scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
102123486 | Jul 2013 | TW | national |