A portion of the Disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
1. Field of the Invention
The present invention relates in general to computer programs, and more particularly to converting a textual representation of a number into a numeric representation of the number.
2. Description of the Related Art
Tagged message formats such as Extended Markup Language (XML) are rapidly replacing fixed-format message formats that typify Electronic Data Interchange (EDI) because the tagged message formats are so much more flexible. However, the fixed-format messages are much more efficient from storage and processing perspectives as they may be typically be processed hundreds, or even thousands of times, more quickly than corresponding tagged format messages. For example, information describing a customer account may be represented by the following EDI fixed-format message (where the “b” characters represent spaces):
The same information may be represented by the following XML tagged format message:
Although the XML tagged format is self-evidently clearer and more flexible, processing it is far less efficient than processing fixed-format data. There are various reasons why processing tagged messages is slower. Two of the main reasons are:
1. The receiving program has to analyze the message, character by character, to distinguish markup (the tags) from message content. This process is called “parsing,” and is computationally very expensive.
2. The message content itself (between the markup tags) typically does not have a fixed-format, and this format must be determined before the content can be processed. Discovering the format by using conventional techniques is also computationally expensive.
In a fixed-format message, a numeric quantity is always the same size, and has the same number of integer and decimal places and so on. In a tagged message format, the representation of the same numeric quantity can vary very widely. For example, with a fixed format of six integer places, a decimal point and two decimal places, the number 20 would always be represented exactly as: 000020.00. In the unconstrained formats that are typical for tagged format messages, the number 20 may be represented in a variety of ways: 20, 20., 20.00, 000020.00, et cetera. Such variability is one of the benefits of using tagged format messages as the sender and receiver of a message do not need to have identical definitions, and can evolve separately. However, this flexibility may come at a penalty in performance if the receiver of the message uses a conventional method of acquiring the incoming data values. For instance, a generalized numeric conversion function may accept any of the illustrated formats, converting them to a standardized representation that could then be assigned to the appropriate program variable. Unfortunately, this may require over a thousand machine instructions.
Thus, there is a clearly felt need for an improved conversion of a textual representation of a number into a numeric representation of the number.
Preferred embodiments of the present invention comprise a method, system, article of manufacture, and computer program product for converting a textual representation of a number into a numeric representation of the number.
In accordance with a preferred embodiment of the present invention, a text representation of a number is converted into a numeric representation of the number by converting the text representation of the number into a description of the number's format; mapping the description of the number's format to a sequence of conversion code; and converting the text representation of the number into the numeric representation of the number by use of the sequence of conversion code.
In accordance with an aspect of a preferred embodiment of the present invention, the description of the number's format is a picture string.
In accordance with another aspect of a preferred embodiment of the present invention, the text representation of the number is converted into a description of the number's format by a translation instruction using a translate table.
In accordance with another aspect of a preferred embodiment of the present invention, the sequence of conversion code for converting the text representation of the number into the numeric representation of the number comprises an assignment statement.
In accordance with another aspect of a preferred embodiment of the present invention, the mapping of the description of the number's format to a sequence of conversion code comprises mapping the description of the number's format to an index which is used to transfer control to the sequence of conversion code corresponding to the description of the number's format.
In accordance with another aspect of a preferred embodiment of the present invention, if the text representation of the number does not convert into the description of the number's format, then the subsequent mapping and converting steps are not executed.
A preferred embodiment of the present invention has the advantage of providing improved conversion of a textual representation of a number into a numeric representation of the number.
A preferred embodiment of the present invention has the advantage of reducing execution time for conversion of a textual representation of a number into a numeric representation of the number
A preferred embodiment of the present invention has the advantage of reducing memory for conversion of a textual representation of a number into a numeric representation of the number.
A preferred embodiment of the present invention has the advantage of reducing an amount of program code for conversion of a textual representation of a number into a numeric representation of the number.
For a more complete understanding of the present invention and the advantages thereof, reference is now made to the Description of the Preferred Embodiment in conjunction with the attached Drawings, in which:
An embodiment of the invention is now described with reference to the figures where like reference numbers indicate identical or functionally similar elements. Also in the figures, the left most digit of each reference number corresponds to the figure in which the reference number is first used. While specific configurations and arrangements are discussed, it should be understood that this is done for illustrative purposes only. A person skilled in the relevant art will recognize that other configurations and arrangements can be used without departing from the spirit and scope of the invention. It will be apparent to a person skilled in the relevant art that this invention can also be employed in a variety of other devices and applications.
The preferred embodiment of the present invention directs control for a given input value to a set of previously compiled program statements that process the exact format of each particular instance of the input value. This is accomplished in three major steps. First, each digit in the textual representation input value is converted to the digit 9, which can be accomplished with a single machine instruction on many models of IBM mainframe computers. This converted value is a valid descriptor, the picture string. For example, if the textual representation input value is 0000020.00, then its descriptor or picture string is 9999999.99 and its length is 10.
An efficient way of acquiring a numeric value from a message and assigning it to a program variable is to use a compiled language assignment statement, such as a COBOL MOVE statement. However, an individual MOVE statement is specific to and only works correctly for a particular number format. Thus, if a set of suitable MOVE statements is provided, each one for a particular format, and if the program can determine the particular format, then the program can transfer control to the appropriate MOVE statement. The problem is discovering the format of the number and transferring control in a manner that the advantages of the high performance compiled MOVE statement are not lost due to the cost of determining the format of the number. Ideally, the format of the number would be instantaneously determined at no cost to provide a description of the format of the number in a form that could be used to direct control to the appropriate compiled MOVE statement. The invention significantly reduces the cost associated with such a determination.
In the preferred embodiment of the present invention, a COBOL MOVE statement is used to convert the textual representation of a number into the numeric representation of the number. The COBOL language can describe the format of a data item in terms of a “picture string.” In the picture string for a numeric quantity, each decimal digit position is represented by the character ‘9’, the decimal point by the character ‘.’ and so on. For example, the picture string description for the number “12345.67” is “99999.99”. The data descriptions and the MOVE statement that can correctly assign this number to an operational data item are as follows:
However, this MOVE statement only correctly converts the textual representation of the number into the numeric representation of the number if the textual representation of the number has exactly five integer digits, followed by one decimal point, followed by two decimal digits. The preferred embodiment of the present invention uses a novel method to transform the textual representation of the number into a description of its format in terms of its “picture” with a single machine instruction. This description is converted into a number which becomes an argument to a computed GO TO statement that passes control to a proper MOVE statement for converting the textual representation of the number into the numeric representation of the number.
The preferred embodiment of the present invention directs control for a given input value to a set of previously compiled program statements that process the exact format of each particular instance of the input value. This is accomplished in three major steps. First, each digit in the textual representation input value is converted to the digit 9, which can be accomplished with a single machine instruction on many models of IBM mainframe computers. This converted value is a valid descriptor, the picture string. For example, if the textual representation input value is 0000020.00, then its descriptor or picture string is 9999999.99 and its length is 10.
Although the COBOL language statement that the preferred embodiment uses to implement the conversion appears complicated:
In the second major step, the picture string produced by the first step is mapped to a number using techniques well known in the art such as a binary search or hashing. This number is then used in a computed GOTO statement to transfer control to the assignment statement that corresponds exactly with the picture string for the input value. Some examples of these assignment statements are:
The naming of the labels, such as “M7-W4D2-W5D2D”, gives indicates the semantics of the assignments: “M7” implies that the answer will occupy 7 digit positions; “W4D2” means that the source has 4 digits before the decimal point and 2 digits after it; “W5D2D” mean that the target has 5 digits before the decimal point and 2 digits after it.
If the picture string cannot be mapped to a number, then the textual representation of the input value does not have one of the expected formats. In this case in the third major step, control falls through to the standard library conversion routine, which can convert valid but uncommon formats. Using this standard conversion has the additional benefit that, if invalid input is received, it will not be erroneously placed automatically into the output. For example, an input value of “34AB449.12” will translate to an invalid picture string “99AB999.99” which does not match any valid picture string such as “9999999.99”.
Referring now to
Referring now to
After the start 305 of the process 300, process block 310 converts the text representation of the number into a description of the number's format, and process block 315 maps the description of the number's format to an index. Thereafter, decision block 320 determines if a valid index, hash, or search result was produced by the mapping of the description of the number's format to an index. If a valid index, hash, or search result was produced by the mapping of the description of the number's format to an index, then process block 325 determines a sequence of conversion code corresponding to the description of the number's format by use of the index. Process block 330 transfers control to the sequence of conversion code corresponding to the description of the number's format. Process block 335 converts the text representation of the number into the numeric representation of the number by use of the sequence of conversion code. Process block 340 returns the numeric representation of the number. The process ends at process block 345.
Returning now to decision block 320, if a valid index, hash, or search result was not produced by the mapping of the description of the number's format to an index, then control passes to process block 350 which returns an error indicating that no conversion was performed, and the process ends at process block 345.
With reference now to the figures, and in particular with reference to
Using the foregoing specification, the invention may be implemented using standard programming and/or engineering techniques using computer programming software, firmware, hardware or any combination or sub-combination thereof. Any such resulting program(s), having computer readable program code means, may be embodied within one or more computer usable media such as fixed (hard) drives, disk, diskettes, optical disks, magnetic tape, semiconductor memories such as Read-Only Memory (ROM), Programmable Read-Only Memory (PROM), etc., or any memory or transmitting device, thereby making a computer program product, i.e., an article of manufacture, according to the invention. The article of manufacture containing the computer programming code may be made and/or used by executing the code directly or indirectly from one medium, by copying the code from one medium to another medium, or by transmitting the code over a network. An apparatus for making, using, or selling the invention may be one or more processing systems including, but not limited to, central processing unit (CPU), memory, storage devices, communication links, communication devices, servers, input/output (I/O) devices, or any sub-components or individual parts of one or more processing systems, including software, firmware, hardware or any combination or sub-combination thereof, which embody the invention as set forth in the claims. User input may be received from the keyboard, mouse, pen, voice, touch screen, or any other means by which a human can input data to a computer, including through other programs such as application programs, databases, data sets, or files.
One skilled in the art of computer science will easily be able to combine the software created as described with appropriate general purpose or special purpose computer hardware to create a computer system and/or computer sub-components embodying the invention and to create a computer system and/or computer sub-components for carrying out the method of the invention. Although the present invention has been particularly shown and described with reference to a preferred embodiment, it should be apparent that modifications and adaptations to that embodiment may occur to one skilled in the art without departing from the spirit or scope of the present invention as set forth in the following claims.