As an example of operation of the invention, consider a representation of the 14-character string
S0=Ab4,97.21−kge. , (1)
The first two alpha characters (“Ab”) correspond to standard ASCII decimal numbers 65 and 98, and the last five alpha characters “−kge,”) correspond to numbers 45, 107, 103, 101 and 44; optionally, the delimiter “−” after the numeral “1” could be (but is not here) added to the numerical string, NS=4.97.21, in the string S0, which includes the delimiters “,” and “.” and “−”.
required to represent the number LN(m) of numerals in NSS(m) and the number LN(m)+1 of delimiter locations in NSS(m) in binary format, respectively, are determined, where int(K) is the largest integer ≦ the real number K.
A null delimiter ND0, indicating that no (other) delimiter is present in the numerical sub-string NS(m), may be represented as a longer than-normal string of Q consecutive binary zeroes or binary ones, where Q is to be determined by other considerations.
In step 15, a set of different delimiters in the set {NS(m)}m of numerical sub strings, with each different delimiter indexed d=1, . . . , D, and
D1=−int{log2(1/D)} (4)
being the number of bits required to represent the number D in binary format. One may choose Q=D1 +q (q≧1).
In step 16, for each m=1, . . . , M, a set DS(m) of all delimiters from DS in the numerical sub-string NS(m) is identified by a delimiter index, d(m′;m) (M′=1, . . . , DP(m); d(m)′; m)=d=1. . . , D) that corresponds to one index value d for the set DS, where DP(m) is the number of (not necessarily distinct) delimiters from the set DS(m), for each m, and the number of bits
DP1(m)=−int{log2(1/DP(m))}. (5)
required for binary format representation is optionally computed. Where no delimiter occurs in NS(m) for a particular m, the set DS(m) contains only the distinctive null delimiter ND0.
In step 17, a numerical position or location in the numeral sub-string NSS(m), numbered NSDP(m′;m) (m′=1, . . . , DP(m)), for the delimiter corresponding to the index value d(m′;m), is identified, for each m. Where DS(m) contains only ND0, {NSDP(m′;m)}m is an empty set. The pair {NDSP{(m′; m), d(m′; m)} refers to a particular location NDSP(m′;m) within the numerical sub-string NSS(m) and to the corresponding delimiter in DS(m), having the index value d(m′;m) and positioned at this location. Alternatively, NDSP(m′;m) can be replaced by ANDSP(m′;m)=NDSP(m′;m)−NDSP(m′−1;m), the distance from one delimiter location to the next location. The pair {NSDP(m′;m), d(m′;m)} is represented in binary format as LN2(m)+D1 bits, independent of the value of m′.
In step 18, the system provides a first ordered array
Arr1(m)=NSS(m)/(null)/{{NSDP(m′;m), d(m′;m)}1 m′=1, . . . DP(m)}, (6)
for each numeral sub-string NSS(m), where the first ordered array for one or more values m consists of only NSS(m)/(null)/ND0, when no delimiter occurs within the numerical sub-string NS(m). Here, (null) is a distinguishable group of zeroes and/or ones in binary format. The first ordered array Arr1(m) is represented as
L{Arr1(m)}=LN1(m)+{LN2(m)+D1}DP(m)+(null) (7)
bits, in binary format.
In step 19, the system represents S(total) as a second ordered array
Arr2=S(total)′=AC(m=1)/(null)/Arr1(m=1)/(null)/AC(m=2)/(null)/Arr2(m=2)/ . . . /(null)/Arr1(m=M)/(null)/AC(m=M+1). (8)
If an alpha character string, such as AC(m=1) and/or AC(m=M+1), is not present in the representation of S(total), this string is also absent in the representation S(total)′. The total binary length of the modified string S(total)′ is
L{Arr2}
=Σ
m
{AC(m)+LN1(m)+{LN2(m)+D1}·DP(m)+(null bits)} (9)
The result is a modified binary format string S(total), in which alpha characters, numerals and delimiters that are part of an modify the numeral sub-string(s) are presented in modified binary format and are processed substantially uniformly in a search for the total string S(total), after the numerical sub-string delimiters and their respective locations are identified. Appendix 1 illustrates application of the invention to an example, S(total)=S0.
In an alternative approach, steps 18-19 are replaced by steps 28, 29 and 30, with steps 21-26 being the same as steps 11-16, respectively. In step 28, the numbers LN1(m) and LN2(m) and DP1(m) are used to determine the number of bits in each of the expressions NSS(m) and {delimiter position NSDP(m═;m)} and {delimiter index d)m′;m)} for m′=1, . . . , DP(m). In step 29, a modified array
Arr1′(m)=NSS(m)/{NSDP(m′;m) plus delimiter no. d(m′;m)/DS(m)/m′=1, . . . , DP(m)}, for each numeral sub-string NSS(m), (10)
where the first ordered array for one or more values m consists of only the null delimiter ND0 when no delimiter occurs within the numerical sub-string ND(m). In the modified first array Arr1′(m), no binary string representing a “null” occurs because the numbers of binary positions for each of the components in this array are known. This approach reduces the binary size from that required for Arr1(m), at a cost of requiring determination and storage elsewhere of the bit sizes of the individual components. In step 30, the system represents S(total) as a modified second ordered array
Arr2′=S(total)″=AC(m=1)/Arr1′(m=1)/AC(m=2)/Arr1′(m=2)/ . . . /Arr1′(m=M)/AC(m=M+1). (11)
A floating point representation of a number is a special case of this general representation, with m=M=1 delimiter present and the particular delimiter being “.” or “,” depending upon what symbol is used to represent the decimal point, or the null delimiter ND0.
When a numeral string S is transmitted, the binary base is to be used (e.g., base 2p=32 (p=5) or base 2p=64 (p=6) or base 2p=128 (p=7) or base 2p=256 (p=8) can be optimized to minimize the bit count needed to specify the string. Surprisingly, the optimum base may change, depending upon the magnitude M(S) of the string S. Consider a numerical string S, consisting of a string of numerals plus a signum bit (±) plus, optionally, one or more delimiters, such as a decimal point for floating point format. The signum bit and the bits representing a delimiter, if any, will not change, no matter what base is used here so that, in any comparison, the presence of these bits can be ignored. Assume that the numeral string S (without delimiters), expressed in decimal format, satisfies
where int{K} is largest integer (positive, negative or zero) that is ≦ K and the magnitude exponent b1 may be positive, negative or zero. Express the numeral string S in binary format in two alternative forms, as
S=2b1f1(S)=2b1{1, a1 a2 . . . , aN}, (14A)
S=2b1+1f2(S)=2b1:1{0.1 a′1a′2 . . . , a′K}. (14B)
where the fractional functions f1(S) and f2(S) are expressed in binary format and satisfy 1≦f1(S)<2 and 0.5≦f2(S)<1 and f2(S)=0.5f1(S) (optional), and the numeral an=a′n1(n=2, . . . N) is either 0 or 1 and is the bit coefficient for the value 2n. (for f1(S)) or the bit coefficient for 2(m+1) (for f2(S)). The coefficient aN, with N dependent upon the string S is the last non zero bit coefficient in the binary expression for f1(S) so that
a
n=0 for n′=N+1, N+2, (15)
Consider expression of S in a base B=2b1=32, and let E(b2;base 32) represent the exponent of the base (a power of 25=32) required to represent the numeral string S as in Eq. (14A) Table 1 sets forth the exponent E(b1;base 2b), with b=5, 6, 7 or 8) for each of a sequence of numeral bases b1
The two columns corresponding to base B1=32 and to base B1=64 have the same exponent E(b1; base B1) for b1=00, 01, 02, 03, 04 (E=0); have the same exponent E(b1;base B1) for b1=06, 07, 08, 09 (E=1); have the same exponent E(b1;base B1) for b1=12, 13, 14 (E=2); have the same exponent E(b1;base B1) for b1=18, 19 (E=3); and have the same exponent E(*b1;base B1) for b1=24 (E=4).
The two columns corresponding to base B2=64 and to base B2=128 have the same exponent E(B1;base B2) for b1=00, 01, 02, 03, 04, 05 (E=0); have the same exponent E(b1;base B2) for b1=07, 08, 09, 10, 11 (E=1); have the same exponent E(b1;base B2) for b1=14, 15, 16, 17 (E=2); have the same exponent E(b1;base B2) for b1=21, 22, 23 (E=3); have the same exponent E(b1;base B2) for b1=28, 29 (E=4); and have the same exponent E(b1;base B2) for b1=35 (E=5).
The two columns for base B3=128 and base B3=256 have the same exponent E(b1;base B3 (for b1=00, 01, 02, 03, 04, 05, 06 (E=0); have the same exponent E(b1; base B3) for b1=08, 09, 10, 11, 12, 13 (E=1); have the same exponent E(b1; base B3) for b1=16, 17, 18, 19, 20 (E=2); have the same exponent E(b1;base B3) for b1=24, 25, 26, 27 (E=3); have the same exponent E(b1;base B3) for b1=32, 33, 34 (E=4); have the same exponent E(b1;base B3) for b1=40, 41 (E=5); and have the same exponent R(b1;base B3) for b1=48.
Conventionally, the base exponents number p=5, 6, 7 and 8 are expressed in binary format as the ordered sequences (1,0,), (1,1,0), 1,1,1,) and (1,1,1,1), respectively. However, by adopting a different numbering convention, such as {p}={5, 6, 7, 8}<−>{(1,0,0), (1,0,1), (1,1,0), (1,1,1)}, or as {p}={5, 6, 7, 8<−>{(0,0), (0,1), (1,0), (1,1)}, all base exponents can be specified using the same number of bits here (2 or 3 ); this modified convention is adopted here, to simplify the comparisons of numbers of bits required. Where a standard convention is used, requiring three and four bits to express base exponents p, comparison of bases 128 and 256 will favor the smaller base for all values of b1.
Consider, as an example, the grouping corresponding to base B1=2p=32 (p=5) and base B1=2p=64 (p=6) for b1=06, 07, 08, 09, with exponent 1:=1. With reference to Eqs. (14A) and (14B), the fractional functions f1(S) and f2(S) require N+1 bit coefficients and N+2 bit coefficients, respectively, to express (not including the signum bit and the delimiter bits, which are the same for each choice of base). Using the modified base exponent numbering convention of the preceding paragraph, specification of each base B1 requires the same number of bit coefficients (e.g., 2 or 3 in the preceding example). The total bit count for a number with E=1 and b1=07, 07, 08, 09 is thus N+3 and N+4 for the respective bases B1=32 and B1=64. The base B1=32 for b1=06, 07, 08, 09 and exponent 1:=1 can be expressed with (at least) one fewer bit coefficients than the base B1=64. In transmission of a numeral string S, expressible as in Eqs. (14A) and (14B), with b1=06, 07, 08 or 09, the choice of base B1=32 (Eq. (14A)) is preferred over the choice of base B1=64 (Eq. (14B)) for some magnitude exponents b1, because (at least) one fewer bit coefficient is required for the choice of Eq. (14A). This choice of the smaller of the two bases is preferred for each of the following base pairs:
b1=00, 01, 02, 03, 04 base 32 and base 64
b1=06, 07, 08, 09 base 32 and base 64
b1=12, 13, 14 base 32 and base 64
b1=18, 19 base 32 and base 64
b1=24 base 32 and base 64
b1=00, 01, 02, 03, 04, 05 base 64 and base 128
b1=07, 08, 09, 10, 11 base 64 and base 128
b1=14, 15, 16, 17 base 64 and base 128
b1=21, 22, 23 base 64 and base 128
b1=28, 29 base 64 and base 128
b1=35 base 64 and base 128
b1=00, 01, 02, 03, 04, 05, 06 base 128 and base 256
b1=08, 09, 10, 11, 12, 13 base 128 and base 256
b1=16, 17, 18, 19, 20 base 128 and base 256
b1=24, 25, 26, 27 base 128 and base 256
b1=32, 33, 34 base 128 and base 256
b1=40, 41 base 128 and base 256
b1=48 base 128 and base 256 With all other magnitude exponents in a range b1=00-50, the larger of the bases (32 vs. 64 vs. 128 vs. 256) is often preferred, but either base can be used.
The optimal choice of (smaller) base for each of the situations set forth in the preceding list can be expressed in a single algorithm. Where the bases B=2p and B=2p+1 (p=5, 6, 7) are considered for transmission of a numeral string S, the smaller base, B=2p, should be used for a magnitude M(S) of the string S expressible as 2b1f(S), where 1≦f(S)<2, where b1 is an integer satisfying
b1=mrp+r, with r=m, m+1, . . . p·1(m=0, 1, . . . , p 1) (16)
Thus, the optimal base chosen (32, 64, 128, 256), which allows expression of the numeral string S in the smallest number of bits, will vary with the magnitude of the numeral and with the exponent b1 required to express the numeral. In many instances, the optimal base will allow expression of the numeral with one or two fewer bits than will any other choice of base, expressed as a power of 2.
M1(S)=2b1f1(S)=2b1{1. a1 a2 . . . , aN}.
and
M2(S)=2b1+1f2(S)=2b1+1{0.1 a′1a′2 . . . a′N·1}.
where 1≦f1(S)<2 and 0.5≦f2(S)<1, and aB and a′n are binary numerals (0 or 1) satisfying aN=a′N+1 (n=1, . . . N), and satisfying an=a′n+1=0 (n>N), and where a magnitude exponent b1 is chosen from among a set of integers including b1=pm+r, with a base exponent p≧5 and m=0. 1, 2, 3, 4, . . . , p·1 and 0≦r≦p−1. In step 34, where the magnitude of S can be expressed in the format M1(S), with base exponent p, and in the format M2(S), with base exponent p+1. for each of m=0, . . . , p−1 and r=m, m+1, . . . p·1, the magnitude of S is transmitted in the format M1(S). This is a first subset of possible combinations of r and m. In step 35, where the magnitude of S can be expressed in the format M1(S), with base exponent p, and in the format M2(S), with the base exponent p+1, and m and r do not simultaneously satisfy m=0, . . . p−1 and r=m, m+1 . . . p·1, the magnitude of S is transmitted in the format M1(S) or M2(S). This is a subset including the remaining combinations of r and m.
Where a numeral string S1 is received in decimal format (using numerals 0, 1, . . . , 9), rather than binary format (using numerals 0 and 1 ), as an ordered sequence of D* (decimal) digits, the computer may be further programmed:
to convert the numeral string S1 to a binary format numeral S2, where S2 is expressed as an ordered sequence of D bits, and where D=˜int{log2(1/DB)}; and p to provide S2 as the numeral string S*, expressed in binary format.
This embodiment is illustrated for base exponents p=5, 6, 7 and 8 but can be expanded to lower values or higher vales of the base exponent p(2≦p≦4 and/or p≧9) by analogy with the present analysis.
The 14-character string
S0Ab4, 97.21, kge,
is decomposed into the following binary components in 7-bit ASCII format:
AC(m=1)=Ab=1000001/1100010,
AC(m=2)=kge, =0101101/1101011/1100111/0101100/0101100.
NS(m=1)=4,97.21==0110100/0101100/0111001/0110111/0101110/0110010/0110001.
NSS(m=1)=49721=0110100/0111001/0110111/0110010/0110001.
where a slash (/) indicates a break between a successive string characters, expressed in ASCII format. Other relevant parameters for S0 are:
M=1,
NL(m=1)=5,
NL1(m=1)=NL2(m=1)=3,
DS={, .}
D=2,
d1=1,
DP(m=1)=2,
DP1(m=1)=1,
NSDP(m′=1;m=1)=2(decimal)=010(binary)1, d(m′=1:m=1)=0,
NSDP(m′2;m=1)=4(decimal)=100(binary1, d(m′=2;m=1)=1,
ND0=11111111 (not used here),
Arr(m=1)==0110100/0111001/0110111/0110010/0110001/(null)/{010 0}/(null)/{100 1},
S0(total)′=1000001/1100010/(null)/0110100/0111001/0110111/0110010/011001/(null)/{010 0}/(null)/{100 1}/(null)/0101101/1101011/1100111/0101100.
Note that AC(m=2) contains two delimiters, one of which also appears in NS(m=1). The total number of bits for the expression of S0(total)′ is (7)(11)+(2)(3+1)+(4)(null bit length)=85+(4)(null bit length).
Number | Date | Country | |
---|---|---|---|
60820081 | Jul 2006 | US |