CHARACTER STRING PROCESSING METHOD, APPARATUS, AND PROGRAM

Information

  • Patent Application
  • 20070157123
  • Publication Number
    20070157123
  • Date Filed
    December 08, 2006
    17 years ago
  • Date Published
    July 05, 2007
    17 years ago
Abstract
In order to solve the above problem, disclosed as a first aspect is a method including the steps of analyzing a character string in a document into partial character strings; calculating, with respect to each of the partial character strings, a score incorporating appearance frequency of the partial character string; presenting the partial character strings and the scores to a user; determining which ones of the partial character strings have been selected by the user; storing the selected partial character strings as a safe partial character string list; and replacing, with predetermined replacement character strings, the partial character strings excluding the partial character strings existing in the safe partial character string list.
Description

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

For a more complete understanding of the present invention and the advantage thereof, reference is now made to the following description taken in conjunction with the accompanying drawings.



FIG. 1 is a diagram showing a configuration of a system of an embodiment.



FIG. 2 is a diagram schematically showing a hardware configuration of a computer realizing the embodiment.



FIG. 3 is a diagram showing a more detailed configuration of a score calculation section 130.



FIG. 4 is a diagram showing a more detailed configuration of a partial character string presentation section 140.



FIG. 5 is a flowchart of a safe character string list generating section.



FIG. 6 is a view showing an user interface of a partial character string check main screen.



FIG. 7 is a view showing a user interface of detailed-information display screen.


Claims
  • 1. A method of processing a character string in a document, the method comprising the steps of: analyzing a character string in a document into partial character strings;calculating, with respect to each of the partial character strings, a score incorporating appearance frequency of the partial character string, whereby a set of scores is formed;presenting the partial character strings and the set of scores to a user;determining which ones of the partial character strings have been selected by the user to form selected partial character strings;storing the selected partial character strings as a safe partial character string list; andreplacing the partial character strings with predetermined replacement character strings, wherein the partial character strings existing in the safe partial character string list are excluded from being replaced.
  • 2. The method according to claim 1, wherein each of the partial character strings is a morpheme.
  • 3. The method according to claim 1, wherein the presenting step comprises presenting the partial character strings and the set of scores to the user in accordance with a descending order of the set of scores.
  • 4. The method according to claim 1, wherein the calculating step comprises calculating the score, with respect to each of the partial character strings, by incorporating, into a calculation, the appearance frequency and a character string length of each of the partial character strings.
  • 5. The method according to claim 1, wherein the calculating step comprises calculating, with respect to each of the partial character strings, the score by incorporating, into calculation, the appearance frequency, a character string length, and any one of a word class in numerical form and a category name in numerical form, all of which are of the character strings, the category name being a group to which the character strings belong.
  • 6. The method according to claim 1, further comprising: calculating, with respect to each of the partial character strings, a risk to form a set of risks, wherein the presenting step comprises presenting the partial character strings, the set of scores, and the set of risks to the user.
  • 7. The method according to claim 6, wherein the set of risks are calculated into higher values with respect to partial character strings included in a risky character string list in which risky character strings are previously stored.
  • 8. The method according to claim 6, wherein the presenting step further comprises presenting a group of partial character strings, wherein each partial character string in the group has a risk with a value lower than a predetermined value, as the selected partial character strings.
  • 9. The method according to claim 1, wherein the presenting step further comprises presenting the replacement character strings of the respective partial character strings.
  • 10. The method according to claim 9, wherein the presenting step further comprises presenting broader terms of the partial character strings as the replacement character strings by using a category dictionary in which the broader terms of the partial character strings are stored.
  • 11. The method according to claim 10, wherein the determining step further comprises accepting editing of the replacement character strings.
  • 12. A character string processing apparatus comprising: means which analyzes a character string in a document into partial character strings;means which calculates, with respect to each of the partial character strings, a score incorporating appearance frequency of the partial character string, whereby a set of scores can be formed;means which presents the partial character strings and the set of scores to a user;means which determines which ones of the partial character strings have been selected by the user to form selected partial character strings;means which stores the selected partial character strings as a safe partial character string list; andmeans which replaces the partial character strings with predetermined replacement character strings wherein the partial character strings existing in the safe partial character string list are excluded from being replaced.
  • 13. A computer program in a storage medium for processing a character string in a document, wherein the computer program causes a computer to perform the steps of: analyzing a character string in a document into partial character strings;calculating, with respect to each of the partial character strings, a score incorporating appearance frequency of the partial character string whereby a set of scores is formed;presenting the partial character strings and the set of scores to a user;determining which ones of the partial character strings have been selected by the user to form selected partial character strings;storing the selected partial character strings as a safe partial character string list; andreplacing the partial character strings with predetermined replacement character strings, wherein the partial character strings existing in the safe partial character string list are excluded from being replaced.
  • 14. A method of processing a character string in a document, the method comprising the steps of: receiving a document;analyzing a character string in a document into partial character strings;calculating, with respect to each of the partial character strings, a score incorporating appearance frequency of the partial character string, whereby a set of scores is formed;presenting the partial character strings and the set of scores to a user;determining which ones of the partial character strings have been selected by the user to form selected partial character strings;storing the selected partial character strings as a safe partial character string list;replacing the partial character strings with predetermined replacement character strings, wherein the partial character strings existing in the safe partial character string list are excluded from being replaced; andtransmitting the document.
  • 15. The method according to claim 14, wherein each of the partial character strings is a morpheme.
  • 16. The method according to claim 14, wherein the calculating step comprises calculating the score, with respect to each of the partial character strings, by incorporating, into a calculation, the appearance frequency and a character string length of each of the partial character strings.
  • 17. The method according to claim 14, wherein the calculating step comprises calculating, with respect to each of the partial character strings, the score by incorporating, into calculation, the appearance frequency, a character string length, and any one of a word class in numerical form and a category name in numerical form, all of which are of the character strings, the category name being a group to which the character strings belong.
  • 18. The method according to claim 14, further comprising: calculating, with respect to each of the partial character strings, a risk to form a set of risks, wherein the presenting step comprises presenting the partial character strings, the set of scores, and the set of risks to the user.
  • 19. The method according to claim 18, wherein the set of risks are calculated into higher values with respect to partial character strings included in a risky character string list in which risky character strings are previously stored.
  • 20. The method according to claim 18, wherein the presenting step further comprises presenting a group of partial character strings, wherein each partial character string in the group has a risk with a value lower than a predetermined value, as the selected partial character strings.
Priority Claims (1)
Number Date Country Kind
2005-370970 Dec 2005 JP national