This U.S. patent application claims priority under 35 U.S.C. 119 (a) through (d) from a Eurasian patent application EAPO 201001550 filed on 25 Oct. 2010.
The invention relates to information technology, specifically to methods of text conversion, search, automated translation, and automated vocalization of the text. The present invention can find useful applications in the fields of development and maintenance of computer systems of various kinds usable in different industries, wherein there is a need in search and analysis of information derived from a variety of sources, e.g. in medicine, science, and education.
Nowadays, there are available a multitude of various search engines capable of executing a search according to comparatively complicated requests entered in a natural language. A major and significant problem however waits for solutions, which problem can be formulated as follows: how to effectively process and analyze the search results and subsequently utilize such results. Particularly, many Internet-found references may essentially coincide, and the search results thus need additional processing with the purpose of identifying the meaning of the results, translation of the results into other languages, and other analytical operations, including vocalization of the results.
The primary object of the present invention is the creation of methods for conversion of text, search, automated translation and vocalization of text, which methods should provide universal and uniform compact storage of the text, searching for complex word constructions, translation of the text into other languages, and vocalization of the text with high quality.
The related art includes U.S. Pat. No. 7,260,573 ‘Personalizing anchor text scores in a search engine’ and U.S. Pat. No. 6,636,848 ‘Information search using knowledge agents’, which deal with the problem.
Besides, U.S. Pat. No. 7,010,526 teaches ‘Knowledge-based data mining system’ wherein ‘data is gathered into a data store using, e.g., a Web crawler. The data is classified into entities. Data miners use rules to process the entities and append respective keys to the entities representing characteristics of the entities as derived from expert rules embodied in the miners. With these keys, characteristics of entities as defined by disparate expert authors of the data miners are identified for use in responding to complex data requests from customers.’ Therefore, ‘Web crawling’ is a process of building a list of words found on a Web page.
The results of processing the entire amount of Web pages, available for the Web crawling, are transformed according to the predetermined algorithmic expert rules and placed into the knowledge base. The subsequent user requests are processed, however, within this particular knowledge base, but not within the entire information cyberspace of Internet, which narrows its usability. The most frequent application of such solution, described in the U.S. Pat. No. 7,010,526, is blocking access to porno information that is automatically excluded from the knowledge base by the expert rules.
U.S. Pat. No. 6,128,624 ‘Collection and integration of internet and electronic commerce data in a database during web browsing’ discloses a system that collects information from two sources: Internet provider and e-commerce provider. Particularly, the first source includes Web log data that contain information on the websites previously visited by the user. This information is used for an individual approach to the user needs in terms of running a Web business (direct marketing) and during development of Web-oriented applications.
The aforementioned related art methods don't fully solve the above-formulated problem of the present invention and don't provide universal and uniform compact storage of the text, searching for complex word constructions, translation of the text into other languages, and vocalization of the text with high quality.
The inventive methods allow eliminating the drawbacks of aforementioned related art methods, and attaining the above-stated object. Accordingly, in a preferred embodiment, a first inventive method for converting an initial text comprises the steps of:—dividing the initial text into a plurality of words;—converting each word of at least a portion of the plurality of words into a corresponding digital representation with a fixed length;—composing a vocabulary of the words, wherein the vocabulary contains the words at least once occurring in the initial text, and/or the digital representations thereof;—the digital representations and the vocabulary are stored with the initial text or instead of the initial text.
It should be noted that the conversion of a portion of the text's words into their digital representation is justified only when the converted text is a standardized text, such as: letters, receipts, contracts, etc.
A second object of the present invention is to propose a second inventive method for searching text converted according to the above described first text conversion method. In a preferred embodiment, the second inventive method comprises the steps of:—composing a predetermined search request consisting of a number of words;—providing a search by converting at least a portion of the number of words of aforesaid search request into their digital representations;—determining the presence of the words of aforesaid search request in the vocabulary;—if the words of aforesaid search request are present in the vocabulary, (a) conducting the search of the digital representation of the words of aforesaid search request among the digital representations of the words of the initial text, or/and (b) conducting the search of the words of aforesaid search request among the words of the initial text.
A third object of the present invention is to propose a third inventive method for automated translation of the text into a predetermined language, comprising the steps of:—converting the words of the text into their digital representations and forming the vocabulary, as described above;—substituting the words in the vocabulary and/or in the digital representation of the words of aforesaid text by digital representations of words with a similar meaning in the predetermined language or immediately by the identical words in the predetermined language.
A fourth object of the present invention is to propose a fourth inventive method for vocalization of the text converted into the digital representation as described above, wherein the method comprises the step of:—generating audio signals respectively to the digital representation of each word of the text, wherein the digital representation provides reproduction of the whole word, versus reproduction of the word by syllables that enhances the quality of vocalization.
The proposed methods solve the above-stated problem of the instant invention, and present a novel universal way of architectural solution, since all the inventive methods employ the same type of text conversion.
When operating on at least two texts, before the conversion of the texts into the digital representation, it is preferable to format the texts into a single symbol encoding. This provides a standardizing and unification of the technological solutions for implementation of the claimed methods.
For the conversion of the texts into the digital representation, it is considered reasonable to use a hash function with a length of hash value less than the average length of the text's words, which provides compact storage of the digital representation.
In the addendums 1, 2, and 3 herein below, there are provided examples of utilization of a hash function having the hash value equal to 3, wherein the average length of words in the text written in Russian is about 6 letters, which provides (also taking into account the spaces between the words) an almost double saving for storage of information.
During the conversion of the text into its digital representation, it also advisable additionally allocating and storing, without limitation, the following characteristics of each word of the text: an initial form and/or basis of, grammar forms, emphasis, synonyms, relation of the words to a knowledge field, emotional background, presence of the words in idioms, and usage thereof, which are important for the search, translation, vocalization of the text, and other operations thereon.
While carrying out the search method, during the composing or/and the execution of a search request, it is reasonable to assure the spelling of the request's words and the presence of the request's words in a predetermined set of words.
While carrying out the translation method, it is preferable to employ the digital representation of words of the text as an address of associative memory, and to store characteristics of each word of the text in the associative memory. The following characteristics, without limitations, may be stored in the associative memory: an initial form and/or basis of a predetermined word, grammar forms of the word, emphasis, synonyms, relation of such predetermined word to a knowledge field, emotional background, presence of such predetermined word in idioms, usage of such predetermined word.
It is important for programming and testing computer programs to implement the inventive methods for the texts being initial texts for the computer programs. For instance, the conversion of the initial texts into the digital representation allows uncovering a majority of deficiencies and errors in the computer program, such as the absence of paired commands, e.g. ‘open the file—close the file’ or ‘allocate the memory unit—free the memory unit’, since an uncompleted paired command is easy to notice in the vocabulary.
For accomplishing an accelerated processing for conversion, search, translation, and vocalization of the text, it is preferable to deploy a special computing apparatus for computation of the digital representation of the text.
It is advisable to employ the inventive method for vocalization for, without limitation, electronic books, mobile device messages, messages of PC and mobile computing devices, navigation systems, which significantly improves services and convenience for the users.
a illustrates a continuation of Addendum 1 demonstrating an example of text conversion according to the present invention.
While the invention may be susceptible to embodiment in different forms, there are shown in the drawings, and will be described in detail herein, specific embodiments of the present invention, with the understanding that the present disclosure is to be considered an exemplification of the principles of the invention, and is not intended to limit the invention to that as illustrated and described herein.
The present invention is disclosed in detail in an exemplary preferred embodiment described herein below. It is referred to
The system depicted on
The system shown on
The unit 4 carries out a comparison and/or search of the digital representations accumulated in the storage device 3. The translation unit 5 automatically translates the text utilizing the digital representation of words thereof, as described above. The translation results are saved to the storage device 3 and provided to the user 6.
Addendum 1 is illustrated on
It can be noticed from
The procedure of comparison of the texts is very important for semantic identification of the texts. For the related art, this problem presents a challenge, since it is necessary to perform a sequential word-by-word comparison of different text pairs, which is a complicated computation task. The proposed inventive method allows substantial simplifying the comparison, and therefore facilitates and improves identifying the semantic meaning of the texts.
Addendum 2 (
Addendum 3 (
Besides, according to a preferred embodiment of the present invention, the translation can be carried out taking into account, without limitation, the following word features: an initial form and/or basis of the word, grammar forms of the word, emphasis, synonyms, relation of the word to a knowledge field, emotional background, presence of the word in idioms, usage of the word, which can significantly improve the quality of translation.
As opposed to the technological solutions of known related art, the present invention allows providing a universal and unified compact storage for texts, search for complex word combinations, translation of texts into other languages, and a high quality vocalization of texts.
Number | Date | Country | Kind |
---|---|---|---|
EAPO 201001550 | Oct 2010 | EA | regional |