The invention relates to a method for encoding and decoding a character, and more particularly to a math function and method for encoding/decoding the character, whereby an output device does not generate or display garbage characters.
Due to the culture and region, it generally happens that the same character is encoded in several different ways in several encoding systems. To be more specific, two encoding systems can use the same number for two different characters or used different numbers for the same character. For a computer set, in case the codes for encoding the characters are different from those preset codes unrecognizable by the operation system in the computer, the latter will produce an illegible content on a display device or a printer. The characters are therefore required to undergo transformation of codes so as to be recognized by the computer set in order to produce a legible content.
The computer set is invented by the United States in the earlier time. The text document is generally composed of 26 English Letters (including capital and small) in combination with some symbols and signs, and some control marks of the computer system. In the earlier time, only 7 bits are required for assigning a byte set in order to present an individual character in the text document. The standard ASCII character set has been adopted to use in the computer set. According to the Standard ASCII Character Set, each binary value between 0 and 127 is given a specific character, a total of 128 characters.
Regarding European languages, the byte set formed by 7 bits is insufficient because European languages includes a plurality of Latin alphabets and upper and lower symbols for specific Letters in addition to 26 English Letters of the Standard ASCII Character Set. Therefore, the aforesaid encoding system needs to be extended. Then comes the ISO 8895, a standard character encoding system for Latin alphabets, which uses eight-bit coded character sets.
For languages used in Asian countries, the ISO 8895 Encoding System cannot conform to the structure of Asian languages, since no spelling method is used to form the characters of these countries. The encoding system requires an extra of several thousand punctuations or technical symbols in order to solve the problems encountered when encoding the Chinese Characters alone. A combination of several bytes is used to encode a specific symbol. Presently, Big5 standard (an encoding method) is utilized for encoding Traditional Chinese characters used in Taiwan and Hong Kong. A double-byte character set is used for encoding an individual character. However, when two languages (Chinese and English) are incorporated within a text document, a particular system is required to partition the text document into separate blocks, one is to be treated in accordance with the Standard ASCII Character Set while the other is treated by the Big5. A rule or principle is required to distinguish whether one byte unit is to be regarded as one character according to the ASCII Character Set or is to be combined with the following byte unit according to the BIG5 Character Set.
Under certain circumstance, the content of a single text document includes multilingual characters, i.e. a specification in Chinese version simultaneously includes Chinese characters encoded according to Big5 and English words encoded according to the Standard ASCII Character Set, the two encoding systems does not conflict with each other. In case, the same text document includes more than two languages, such as Chinese, Japanese and Korean characters, and when the printer, connected to a PC, prints out the text document, garbage characters will be generated. The reason resides in that the encoding system for one writing method of one language differs from one another and the data base of the PC installed with Windows Operation System in Chinese Version only includes Big5, which enables to recognize the Chinese characters only and not the Korean Characters. Some software installed within the PC may transform these languages into several encoding systems, thereby disposing the printer in a condition, in which, the printer is unable to recognize the encoded characters such that the printout is in a state of garbage characters.
Therefore, it is the object of the present invention to provide a software to be installed into a computer set provided with whatever the operation system, thereby permitting the computer set to produce an accurate text document regardless of whatever language the text is written.
It is an another object of the present invention is to effectively solve the conflicts among the encoding systems encountered by the conventional computer set.
According to the present invention, a code transforming method is provided for use in an operation system of a computer prior to displaying and printing out a text document. The method includes the steps: (i) transforming content of the text document into a temporary text file utilizing a data transformation service; (ii) coding each character of the temporary text file into a sequence of 16-bit words; (iii) encoding the sequence of 16-bit words in accordance with a transformation table for displaying or printing; and (iv) outputting the encoded sequence of 16-bit words to a Dynamic Link Library in order to display or printout the text document.
When compared to the prior art technology, the code transformation method of the present invention can eliminate the garbage characters outputted by a printer or a display device due to the conflicts among the encoding systems for multilingual characters of several languages.
Other features and advantages of this invention will become more apparent in the following detailed description of the preferred embodiment of this invention, with reference to the accompanying drawings, in which:
The object of the present invention is to provide a code transformation method (a code transformation software) for using in a computer set. The computer set (with the presently installed operation system and the encoding system) is thus enabled to encode the content of a desired text of any writing system that can be recognized by an output device such that the output device can smoothly output the desired text.
The encoding system of the present Operation System and the output device includes the same codes or code sets for encoding character. The code transformation process is carried out utilizing the Data Transformation Service, and those codes 10200 stored within the Data Base in combination with the software of the present invention. Fundamentally, those code 10200 stored within the Data Base are formed according to the Unicode Standard and serve as the basic codes or code set for the software of the present invention. No limitation should be set for the basic codes or code set. Alternately, the basic codes or code set can be formed in accordance with USC4. The basic code set preferably includes numbers, symbols, representative marks for encoding systems of writing methods of multilingual characters of any regions or countries. When the basic code set is thus arranged, we can encode and transform the characters of any writing method.
According to step 10: it is desired to print out a text document by the PC via the printing apparatus, and it is discovered that the encoding system for the characters of the text document conflicts with that of the printing apparatus. According to step 12: the software of the present invention will transform the content of the text document into a temporary file by the Data Transformation Service. Since the Data Transformation Service is also utilized for printout, the content of the output document is not produce in garbage characters i.e. is legible. According to step 14: the math function in the software of the present invention will transform the codes of the Operation System for coding the characters into the codes compatible to the basic code set, i.e. each character is represented by a sequence of 16-bit words of the Unicode Standard. According to step 16: encode the sequence of 16-bit words in such a manner to conform with the encoding system of the printing apparatus, wherein an appropriate code transformation table is present between the encoding system of the printing apparatus and the Unicode Standard. Utilizing this code transformation table, it is quite easy to transform the codes of the Unicode Standard into ones recognizable by the encoding system of the printing apparatus.
Finally, the content (decoded) of the text document is transmitted to the Dynamic Link Library, where a print control command is sorted for commanding the encoding system of the printing apparatus in order to permit the printout of the text document.
According to the present invention, the codes 10200 stored within the Data Base 1020 enable the same software to process multilingual characters of any languages or regions, and therefore is compatible to the encoding system of all writing method of multilingual characters No amendment is required for the software. Under this condition, the transformation interface (the software) is responsible inter-transformation of the codes between the Operation System and the basic code set of the Unicode Standard. The outmost task of the transformation interface is to process the multilingual characters, permitting inter-transformation of the codes among multilingual characters. The software written according to the present invention can support the Operation System under any environments regardless of the encoding system for any writing methods.
Each character is encoded in sequence of 16 bit set of the Unicode Standard (see the left side in
Note that after the content of the text document is transformed by the Data Transformation Service into the temporary text file, the latter is coded by the software of the present invention. In order to accomplish the task, each character is coded into the code included within the basic code set, which later undergoes transformation by means of the Code Transformation Table, thereby ensuring smooth transformation and obtaining a text in the required characters.
While the present invention has been described in connection with what is considered the most practical and preferred embodiments, it is understood that this invention is not limited to the disclosed embodiments but is intended to cover various arrangements included within the spirit and scope of the broadest interpretation so as to encompass all such modifications and equivalent arrangements.
Number | Date | Country | Kind |
---|---|---|---|
95134373 | Sep 2006 | TW | national |