The present invention relates to the field of data representation and has applications in data compression, transmission, and storage. More specifically, the invention provides a new method for numerically encoding data that is directly applicable to the encoding of numerical data and indirectly applicable to the encoding of any and all other data, provided that the data symbols or characters are translated to an equivalent set of numerical codes.
The Matched Representation Space (MRS) method of data representation and thinning is a method that came out of efforts to overcome limitations in the Alternative Bases Processing (ABP) method of data compression and signal processing. The ABP method centers on the use of a non-orthogonal transformation intended to provide dimensionality reduction while maintaining full signal energy through the compression process. Detailed analysis of the ABP process identified issues arising from the introduction of unintended distortions and errors by the non-orthogonal transform. The discovery and invention of MRS is a direct result of considering the basic properties of image data while in search of ways to mitigate the distortions and errors within ABP. Specifically, we began to consider questions on the accuracy and precision required to reproduce a data set to a specified finite error, or a specified finite relative error. Simply put, distortions and errors are not necessarily a problem as long as they are smaller than the precision (absolute or relative) that the data are rounded to. We thus began considering the fundamental properties of signals, signal data, their representation as numerical and binary data, and their properties as they are manipulated by image, signal, and data processing steps. The key concepts that came together to produce MRS are those of significant and non-significant digits, and the binary series representation of integer data.
The prior art in the representation of numerical data is best understood through the example of binary data. Consider the N-bit binary representation of an integer I. The N-bits bn are the coefficients of the binary power series representation of that integer I:
I=Σ
n=0
N−1
b
n2n (1)
The binary power series is an exact representation of the integer I as indicated by the equal sign in equation (1), and this is a significant piece of implicit a priori information underlying conventional digital data representation, storage, transmission, and processing. The fact that the lower limit of the series is zero and that the base for the power series is two constitute implicit a priori knowledge in any use of a binary integer representation. There is no implicit upper limit on the power series, thus the number of bits N corresponding to the number of terms in the series is required as explicit a priori information. The coefficients can be gathered as bits into a binary “word” as a sequence of N total ones and zeros, typically in an order following the powers of the expansion. The bit sequence is typically ordered from high to low in accordance with the norms of positional notation. Thus the left-most bit in the word corresponds to the coefficient of the highest-order term in the expansion, and the right-most bit corresponds to the coefficient of the lowest-order term in the expansion. These two bits are commonly referred to as the “most significant bit” (msb) and the “least significant bit” (lsb) respectively based on their positional significance in the representation scheme; however this common usage violates the definition of the term significant figures which is defined by Merriam Webster Dictionary as “figures of a number that begin with the first figure to the left that is not zero and that end with the last figure to the right that is not zero or is a zero that is considered to be correct.”
The common usage of the “most significant bit” and “least significant bit” terminology embodies a significant technical error that has placed a conceptual barrier on the prior art. The prior art has required the use of a fixed length binary data field with the explicit transmission of a binary character for every element in the data field without regard to significance and whether the element is actually a non-significant place-holder.
Referring now to
21=16+4+1=24+22+20 (2)
The binary series expansion is shown as item 2 in
Addressing the conceptual error associated with the common usage of the “most significant bit” and “least significant bit” terminology has been the key to unlocking a new and improved data representation. The new method as disclosed here overcomes these disadvantages and limitations by properly recognizing significance, thus leading to an improved data structure that separates the numerical data into a scale header and an additional precision packet. The scale header and additional precision packet form separate and separable data structures in the MRS method that are represented differently, can be stored and transmitted independently, and processed independently or jointly.
Accordingly, the present invention provides a new method for numerically encoding data. The Matched Representation Space (MRS) method is based upon the concept of significant figures and separates the representation into a scale header and an additional precision packet. The new method provides options for both exact and approximate data representations that afford advantages in data compression, transmission, and storage.
According to one embodiment, the present invention includes a method for representing and numerically encoding data that includes providing a representation of data and separating the representation into a scale header and an additional precision packet. Separating the representation includes identifying the location of the highest-order non-zero bit and encoding the location of the highest-order non-zero bit to form the scale header. The balance of the bits following the highest-order non-zero bit, or a truncated set of these bits, is encoded to form the precision packet.
The scale header and additional precision packet form separate and separable data structures in the MRS method that are represented differently, can be stored and transmitted independently, and processed independently or jointly.
Other objects, advantages and salient features of the invention will become apparent from the following detailed description, which, taken in conjunction with the annexed drawing, discloses a preferred embodiment of the present invention.
A more complete appreciation of the invention and many of the attendant advantages thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying
The disadvantages of the prior art can be overcome by using a new method for forming the data representation that separates the numerical data into a scale header and an additional precision packet. The new method provides options for both exact and approximate data representations that afford advantages in data compression, transmission, and storage as will be made clear in the description that follows and as illustrated in the Figures.
The new method centers on the concept of significant figures, and this is directly related to the concepts of accuracy and precision:
Merriam: Definition of ACCURACY
Merriam: Definition of PRECISION
We take accuracy as defined by correctness and precision as defined by exactness for consistency with the definition of significant figures. The significant figures correctly represent a datum at the corresponding level of precision, i.e., every figure that is significant is deemed to be accurate. The MRS representation conceptually divides the data representation into accuracy (scale) and precision components, and assures that the representation is correct to within definable level of exactness. A scale header represents the correct value of the “most significant bit” where a “significant bit” is defined as bits of a binary number that begin with the first bit to the left that is not zero and that end with the last bit to the right that is considered to be correct.
In the MRS representation scheme, we recognize that any leading zeros in the binary word are placeholders and are not significant figures in a scientific or technical usage of the data. We also recognize that smaller figures to the right are non-significant when they cannot be considered accurate. The partition into a scale header and an additional precision packet allows for a simple means to accurately represent the data to within a stated level of precision, and to add successively higher precision if required. This partition reflects a fundamental difference in the meaning and interpretation of binary and other numerical data represented using the MRS method. Whereas a conventional binary word communicates a mathematical equality as defined by Eq. (1), corresponding MRS data structures communicate a bounded region of uncertainty where the uncertainty has a maximum possible value and can be reduced to zero by adding precision. Mathematically, for any integer with true value β represented using the MRS method and binary symbols, the simplest MRS representation μ is established solely by the scale header and bounds the true value β from below as
μ≦β<2μ (2)
The uncertainty range is reduced by one half for every additional bit of precision encoded in the additional precision packet, up until the point that the MRS representation forms an exact representation of the true value with zero uncertainty. Thus, whereas a conventional binary word only communicates an exact value, an MRS data structure communicates both an exact lower bound to a possible value (via the scale header) and a specified uncertainty range (via the size of the additional precision packet). The representation of exact values using MRS is limited only by the number of symbols encoded in the additional precision packet.
Referring now to
For binary integers, the accuracy of the MRS representation is guaranteed for even the smallest possible representation by the encoding of the most significant bit. For binary integers, the smallest possible MRS representation will always be within a factor of two of the true value as bounded from below, and this is significantly better than the simplest conventional decimal approximation that would capture the correct decimal order of magnitude. The MRS additional precision packet improves the exactness of the representation with each additional precision bit, with the maximum potential MRS representation error decreasing by one half for each additional bit, as bounded from below. Note that the rate at which precision improves is constant over all of the significant bits in the MRS binary representation.
Preferred and Alternate Embodiments: The MRS method can be implemented in a number of ways depending upon the specific requirements of the application. Four options are available with respect to using fixed or variable length data structures, and whether the data are represented at full precision or at some designated lesser precision:
1) Full precision MRS representation with fixed length data structures. This option balances certain advantages in the MRS representation scheme against a fractional increase in the size of the representation when maintaining absolute fidelity to the binary truth data. This option increases the total number of symbols used to represent the data as compared to simply using the original binary truth data, as shown in Table I. Inflation occurs in this scheme since the data structure length is sized to accommodate the largest possible scale header and the largest possible precision packet at the same time, whereas these occur as opposites in the truth data (i.e., a binary word with no leading zeros has N precision bits; a binary word with all leading zeros requires no precision bits; for any case in between the sum of the leading zeros and precision bits totals to the N-bit length of the conventional binary word). Note, however, that this inflation occurs prior to any entropy encoding, and that the entropy encoding can be expected to recover some or all of that inflation.
In the example of an N=8 bit binary integer given here, the scale header 0100 and additional bits of precision represented by the symbol sequence 0101 would be padded with three additional trailing zeros to fill out the field of seven total additional precision bits yielding a full precision MRS fixed length data structure composed of the scale header 0100 and the additional precision packet 0101000 with a total size of eleven symbols.
2) Full precision MRS representation with variable length data structures. The number of scale header bits is explicit a priori information that also establishes the number of additional bits of precision following the scale header, and this makes the use of variable length data structures simple to implement. The use of variable length data structures can be expected to change the size of the representation, and also impacts on the approaches available for entropy encoding. The size of the MRS data may be increased, decreased, or use the same number of symbols to represent the data depending on the particular integer(s) being represented. The longest full precision variable length MRS binary representations are equal in size to the corresponding full precision fixed length MRS binary words. Thus when considering any data set composed of multiple data elements, the increase in MRS binary representation size as shown in Table I also establishes the upper limit on the increase in size of the data representation when using full precision variable length MRS binary representations. The smallest full precision variable length MRS binary representations are those that encode all zeros. These are analyzed in Table II for the same cases as considered in Table I. Taken together, Tables I and II bound the possible range of expansion or compression that can be achieved for an arbitrary set of conventional binary words when converted to full precision MRS representation with variable length data structures. Depending on the size (in number of bits N) and distribution of the conventional binary words, substantial lossless compression is at least theoretically possible.
In the example of an N=8 bit binary integer given here, the scale header 0100 and additional bits of precision represented by the symbol sequence 0101 would be represented as-is without padding any additional trailing zeros, thus yielding a full precision MRS variable length data structure of scale header 0100 and additional precision packet 0101 with an overall data structure length of eight symbols.
3) Adjusted precision MRS with fixed length data structures. This option involves an explicit size for the fixed data structure resulting in an implicit size for the precision packet, or an explicit size of the precision packet resulting in an implicit size of the data structure. In either case, the size of the precision packet defines the limiting relative precision of the data representation.
The smallest possible fixed length representation is simply the scale header. The scale header bounds the true data value from below, and is always within a factor of two of the full precision true data value. The precision packet is padded with trailing zeros as needed when the size of the precision packet exceeds the size of the true precision data available. Each additional precision bit reduces by one half the maximum possible representation errors, up to the point that the data are represented at full precision. As with the scale header, the true data are bounded from below in the MRS representation. We observe that in appending additional precision bits, the MRS representation refines each data element at the same rate of increasing relative precision, up to the point that full precision is reached for that data element or the maximum data structure length is reached.
We designate the order of the precision P as the number of symbols in the original data sequence that are coded exactly in the MRS(P) representation. Clearly, the upper limit is P=N meaning that every bit in the N-bit conventional word is represented exactly using MRS(P=N). Since the header encodes the location of the highest-order non-zero bit, this is designated as MRS(1). Appending one precision bit yields MRS(2), appending two precision bits yields MRS(3), etc. In the example of an N=8 bit binary integer given here, the scale header 0100 constitutes the complete MRS(1) data structure of a fixed four symbol length. The MRS(2) representation would expand the data structure to include an additional precision packet composed of the next precision bit for a data structure of scale header 0100 and additional precision packet 0 for a fixed five symbol length. The MRS(3) representation would append another precision bit to the additional precision packet a for a data structure of scale header 0100 and additional precision packet 01, etc. Finally, the MRS(8) representation that would guarantee full precision in representing the data would pad trailing zeros yielding the data structure of scale header 0100 and additional precision packet 0101000 for a fixed eleven symbol length. The characteristics of several cases of interest are shown in Tables III-VI.
4) Adjusted precision MRS with variable length data structures. The number of scale header bits is explicit a priori information that also establishes the number of available additional bits of precision following the scale header. Depending upon the nature of the data being considered, there may be reason to truncate the precision packet according to some rule set established as explicit a priori information regarding the representation scheme. This option has properties similar to those of using variable length data structures with full precision as regards the size of the representation depending upon the size(s) of the data to be represented, and that the size of the MRS data may be increased, decreased, or use the same number of symbols to represent the data. Under any rule set, the longest possible adjusted precision variable length MRS binary representations are equal in size to the corresponding full precision variable length MRS binary words, thus Table I also establishes the upper limit on the increase in size of the data representation when using adjusted precision variable length MRS binary data structures. The smallest adjusted precision variable length MRS binary representations are those that encode only the scale header as MRS(1). The limiting cases of adjusted precision with variable length data structures for common conventional binary word sizes are summarized in Table VII. Taken together, Tables I and VII bound the possible range of expansion or compression that can be achieved for an arbitrary set of conventional binary words when converted to full precision MRS representation with variable length data structures. Depending on the size (in number of bits N) and distribution of the conventional binary words, substantial lossless compression is at least theoretically possible.
In the example of an N=8 bit binary integer given here, the scale header 0100 constitutes the smallest possible adjusted precision variable length data structure, composed solely of the scale header 0100. Similarly, the scale header 0100 and corresponding full additional precision packet 0101 forms the largest possible adjusted precision variable length data structure.
Additional Implementation Considerations: All four implementation options encumber several additional considerations.
1) Scale Header—The scale header needs to encode the position of the highest-order non-zero bit within the N-bit binary sequence, within a scheme that can accommodate values as large as N and also accommodate the case of no non-zero bits. Thus there are N+1 cases that need to be encoded. The minimum length for the MRS representation is then the number of bits required to fully encode the header as
Minimum MRS word length=┌log2 (N +1)┐ (2)
where ┌ ┐ denotes the ceiling function, i.e., the smallest integer larger than the argument of the function.
The fact that there are actually N+1 possible cases for the location of the highest-order non-zero bit in the conventional binary word leads to inefficient coding for the most common cases where N itself is an integer power of the base two (N=4, 8, 16, 32, 64, etc.). As an engineering expedient in reducing the size of the representation, the N+1 cases can be reduced to N cases through several schemes, such as by encoding the number of leading zeros, and treating the two possible cases of N−1 leading zeros as a single special case. Alternatively, the N+1 cases can be reduced to N cases by unambiguously interpreting the cases that specify the location of the highest-order non-zero bit up to the N−1 position, and ambiguously interpreting a ‘1’ in the N position as a special case. Both of these schemes involve ambiguity in the least significant bit that can be resolved explicitly by an additional precision packet composed of a single precision bit. As an alternative, the value of the least significant bit can be interpolated on a statistical basis.
2) Interpolation—The MRS representation scheme provides straightforward means to track which of the MRS symbols are an exact representation of the conventional binary data, and which are otherwise. Significantly, the MRS representation bounds the true data from below to within an interval defined by the step size of the lowest-order exactly represented bit. This is a very desirable property in that it allows interpolation of the next one or two smaller bits, if desired at point of use, as a statistical approach to reducing the representation error.
The MRS method distinguishes those symbols that represent ‘true’ values from those that do not. The smallest ‘true’ value is the least significant reported bit, which is not necessarily the least significant bit overall as some thinned precision (less than full precision) may be in use. Depending upon the nature of the data, adjusted precision combined with interpolation may provide a means for ‘statistically lossless’ compression, where the data recovered after compression have the same mean and variance as the full precision true data, although individual data elements may differ from their corresponding full precision true values. The ‘statistically lossless’ option is desirable as a means for achieving greater compression where the data are intended for processes whose outputs depend on the statistics of the data set, as compared to processes whose outputs depend on specific element by element values. The MRS method allows for three options in handling the representation.
a) Assign zero to all symbols beyond the least significant reported bit. This option maintains the MRS representation as a lower bound to the full precision value. For uniformly distributed data, the average representation error is approximately the size of the least significant reported bit.
b) Assign one to all symbols beyond the least significant reported bit. This option maintains the MRS representation as an upper bound to the full precision value. For uniformly distributed data, the average representation error in this case is also approximately the size of the least significant reported bit.
c) Interpolate the next smaller bit following the least significant reported bit. The interpolation can be done blindly or based on an analysis of the statistics of the next smaller bit.
i.) In the blind case, the statistics of the data are unknown. This option assigns a symbol one to the next smaller bit and a symbol zero to any further smaller bits. The advantage in doing so is that, for uniformly distributed data, the average representation error is reduced to approximately one half the size of the least significant reported bit. The disadvantage in doing so is that the interpolated value represents a statistical mean value and not a crisp lower or upper bound to the full precision value.
ii.) If the statistics of the data are known, then the representation errors can be minimized by interpolating using a longer sequence of symbols corresponding to the mean value of the smaller symbols.
3) Entropy Encoding—The sequence of symbols used in the MRS representation can be packed more compactly using any entropy encoding scheme. For arrays of data, the MRS representations can be encoded directly by serial encoding of the symbol stream, or indirectly using a bit-plane approach. The MRS representation is fully compatible with standard entropy encoding techniques such as arithmetic coding, run length coding, and L-Z or L-Z-W coding. The details of the entropy encoding will vary depending upon the nature of the original data and the type of MRS representation being used. Some example cases are as follows:
a) Using fixed length MRS methods to represent an array of numerical data. The entropy encoding can be applied serially on an element by element basis. The scale header data and additional precision packet data can be encoded as two distinct structures, or on an interleaved basis as a larger meta-structure. Similarly, the scale header data and additional precision packet data can be encoded as two distinct structures, packed in parallel by decomposing the data arrays into a series of bit planes, or on an interleaved bit-plane basis as a larger meta-structure. A two dimensional array of numbers can be thought of as defining a three dimensional surface with the magnitudes of the numbers indicating the height of the surface above the coordinate plane of the array. In this sense, the MRS scale header bit planes encode the most significant components of that surface. Each subsequent additional precision bit plane then encodes an increased precision in the accuracy and exactness of the representation of the surface. The MRS data structure left aligns all of the additional precision bits and right pads any non-significant bits as zeros. In the bit plane decomposition, that results in each plane adding precision at the same rate for all data elements, until the limiting precision of any particular data element is reached. That also results in the sparseness of the higher-order conventional bit planes being transferred to the lower additional precision packet bit planes in the fixed length MRS representation. Under those conditions, entropy coding, data storage, and data transmission are likely to be more efficient when performed on a bit plane basis.
b) Using variable length MRS methods to represent an array of numerical data. Variable length MRS employs a fixed length scale header and a corresponding variable length additional precision packet. The variable length makes bit plane representation of the additional precision packets difficult. The scale header can be encoded as bit planes, and then the additional precision packets can be encoded serially on an element by element basis.
Known techniques for data compression include a variety of transforms on and component decompositions of arrays of data represented as matrices. There is no apparent fundamental incompatibility between MRS and the use of transforms and other techniques as applied to data compression. However, there may be practical issues that will lead to a preferred order of operations when joining MRS with other techniques.
Several properties of the Matched Representation Space method of data representation bear additional consideration.
1) Generality of the Matched Representation Space (MRS) Method—The MRS method defines a general approach to the representation of numerical data for data with positional significance. Decimal numbers are the most common example of numerical data with positional significance, and use a positional notation where digits are understood to be in the ones column, the tens column, the hundreds column, etc. The significance of a decimal number is understood to be its order of magnitude, and that is determined directly by the position of the highest-order non-zero digit. The MRS method takes advantage of positional significance to provide a compact representation of the significance of a numerical datum. The case of a binary series representation provides a particularly useful characteristic in that the highest-order term in the binary series, taken by itself, is greater than the sum of all other terms in the series. Thus knowledge of the highest-order term in the binary series representation guarantees that the MRS representation of the datum is bounded from below to within a factor of two. The binary series has the additional advantage that there is no ambiguity in the value of any non-zero symbol.
In order to apply the MRS method to other than binary numbers, it is necessary to first know the type of numbers and set of symbols being used. For example, with decimal integers this involves knowing the set of decimal integer symbols {0, 1, 2, . . . , 9}, along with the number of columns being considered and their position relative to the decimal point separating the whole part from the fractional part of the number. The MRS decimal representation would then code the location (column number) of the highest-order non-zero digit as a scale header. The precision packet would begin with the specific decimal value of the highest-order non-zero-digit, and then follow in order with the balance of the digits. The MRS representation would be guaranteed to bound the datum from below to within an order of magnitude.
The generalization to other numerical data types hinges on the positional significance of the underlying data representation. Both decimal and binary integers are examples of power series representations that are natural candidates for positional notation. Other series with known structure can be represented using the MRS method, but the utility of the method is clearly limited in cases where the positional significance is low or lacking.
The MRS method can also be generalized to the case of non-numerical data. In that case, the non-numerical symbols are first converted to an equivalent numerical code such as those for the ASCII and Unicode character sets, and the numerical codes are then represented using the MRS method.
2) Data Thinning—The decimal equivalent of a given length binary word and vice versa is readily calculable using equation (1). These are shown in Table VIII for various cases of interest. The equivalence is important to understand in sensibly representing data and the results of calculations relative to the number of significant figures. The most precisely measurable quantity is time/frequency with a relative precision of about 15 significant figures achieved by the best national standards laboratories. Other quantities are measured with far less precision, and many practical measurements are made at the level of two to three significant figures. In consequence, conventional fixed length binary representations are substantially inefficient in representing data that have a limited number of significant figures. The MRS representation can be significantly smaller than the conventional binary representation, particularly for large binary word lengths and small numbers of significant figures.
As an example, consider encoding the data stream generated by a high dynamic range measurement system using a 64-bit A/D converter. The 64-bit dynamic range of the A/D is equivalent to a 19 decimal digit data field. While the data may take any value within the dynamic range of the A/D, measurement noise and other processes will limit the accuracy and precision of the reported value. As an arbitrary example, consider the case of data having an accuracy and precision of one percent, in which case only 2 of the 19 decimal figures are significant. The data could be represented with as few as 7 bits under the proper representation format by leveraging its explicit and implicit a priori information. With slightly lesser efficiency, the 64-bit conventional binary integer data can be represented exactly using a fixed MRS(7) representation requiring 13 symbols. Because MRS includes precision within its basic design, that precision is carried forward and provided as a feature of the data transferred to MRS format.
Note that the MRS approach in separating the scale header and additional precision packet provides a simple scheme for data thinning exclusively by truncation of the additional precision packet. Data thinning based on retaining only the significant figures, or equivalently significant bits, is accomplished simply by setting to zero all bits of the additional precision packet that represent values smaller than the specified relative precision. This type of data thinning can be performed independent of and without regard to the particular values of the scale header data, or as a specified function of the particular values of the scale header data. The thinning can be done both for measured and reference data, and for the results of numerical analyses as determined by the mathematical operations performed.
3) Data Compression Properties of the MRS Representation—The MRS method provides a number of options for reducing the number of symbols required to encode numerical data via data compression and data thinning. Data compression using MRS can be obtained in several ways, including the following:
Original data→MRS representation of the data→entropy encoding of the MRS representation
Original data→transform encoding→MRS representation of the transform coefficients→entropy encoding of the MRS representation
Original data→MRS representation of the data→transform encoding of the MRS representation→entropy encoding of the transform coefficients
Some important properties with respect to data compression are as follows.
a) Compression as a by-product of MRS structure. In any data representation, knowledge of the data structure provides explicit a priori knowledge that reduces the number of bits to be stored or transmitted. In the case of the MRS method applied to binary data, the a priori knowledge includes that the first non-zero symbol is a one. The location of that symbol must be specified, but not its value. Thus the explicit a priori structural information reduces by one the number of symbols that must be stored or transmitted to represent the full precision of a given datum.
The MRS representation provides for tracking of the most precise ‘true’ data bit. Consequently, interpolation for increased accuracy and precision is always available to the user of the data at the time of use, with no cost in storage and transmission. The caveat on the use of interpolation is, of course, that it is valid only on a statistical basis and is therefore only justified for data of a corresponding statistical nature.
Finally, the location of the highest-order non-zero bit coded in the MRS scale header provides implicit information on the maximum length of the corresponding precision packet, and that leads to a simple approach to using variable-length data structures for a more compact representation.
b) Compression via adjusted precision and data thinning. The MRS method provides a simple means to adjust the relative precision of the data representation by truncating the length of the precision packet appended to the scale header. The truncation length and resultant precision can be managed based on any suitable criterion established by user requirements for accuracy and precision as traded-off against the size of the representation. Compression via adjusted precision is illustrated in
c) MRS and compression metrics. Even the least precise MRS representation provides an exact representation of the most significant term of the underlying series representation of the true data. While the mathematics have yet to be formalized, we anticipate excellent correlation coefficients and other similarity measures between the MRS representation and the ‘true’ data in all cases, including for limited precision under MRS(P). The reason is simple: the MRS representation is simply an alternate but exact representation of the same series expansion. The binary series expansion adds precision in order of significance and can never differ from the ‘true’ value by more than a factor of two, and that limiting maximum difference will follow through in the correlation coefficient or other similarity measures. We thus anticipate a correlation coefficient of 0.5 or more under normal circumstances.
A significant corollary arises in arrays of data such as the array of pixel intensities in a digital image. Since every individual datum is accurately represented to within a factor of two as bounded from below, the relative ratio between any two data points is always correct to within a factor from a maximum of two times larger to one half times smaller at the minimum. It is simply impossible with a binary implementation of MRS to even begin to approach order of magnitude errors in either the absolute data or its relative variations across a data array.
d) Joint use of MRS and transform or other compression approaches. Known techniques for data compression include a variety of transforms on and component decompositions of arrays of data represented as matrices. There is no apparent fundamental incompatibility between MRS and the use of transforms and other techniques as applied to data compression. However, there may be practical issues that will lead to a preferred order of operations when joining MRS with other techniques.
Applications and Considerations in Productization:
The MRS method is directly applicable to the representation of real integers through use of a binary series expansion. The method is readily extensible to complex and non-integer numerical data through obvious extensions of the MRS data structure and its explicit a priori information. The MRS method is further extensible to the representation of non-numerical character data through the use of numerical codes as designations for the character set(s) of interest, such as the ASCII and Unicode character sets.
The first usage of the MRS method has been in the compression of real valued synthetic aperture radar (SAR) imagery represented as a two dimensional array of integer pixel intensities with good results obtained. An extension to complex SAR imagery is being developed by the MacB ATG for evaluation. MRS experimentation using a variety of publicly available color and grayscale images including medical images is ongoing, with mixed results. Additional work on acoustic and other data types is also in progress. To date, there does not appear to be any substantial limitation to the potential application space for the MRS method. Additional applications have been proposed in hyperspectral imagery, RF signals, and even financial data.
The MRS method yields at least three significant advantages for image compression, storage, and transmission:
1) The MRS method in and of itself does not use a transform or other complicated mathematics to produce the MRS representation. The MRS method executes very quickly since the representation is formed with only a small number of operations required to find the location of the highest-order non-zero bit and encode its location, followed by appending the additional precision packet and any zero padding to the right.
2) The structure of the MRS scale header paired with a corresponding additional precision packet eliminates the need to create, store, and transmit separate “low” and “high” resolution images. The lowest possible resolution MRS(1) image is just the scale header, with the characteristics as noted that all pixel intensities are within a factor of two of their true values, bounded from below, and the ratio of any two pixel intensities is correct to within a factor of two larger or smaller by one half Adding precision is accomplished by adding additional precision bits to the MRS representation. For images stored as two dimensional intensity arrays, the data are conveniently handled as bit planes. The MRS(1) image can be increased in precision up to the MRS(N) limit (full fidelity) by over-writing one bit plane at a time, as needed. The MRS symbols on-hand are always correct symbols, and the precision of the image is refined without any need to recalculate the image.
3) Especially for large images, the MRS structure enables the transmission and display of the MRS(1) minimum precision image for use as a preview or screening image, and the rapid back-fill with increased precision in designated areas of the image. The additional precision is simply added into the existing data fields by over-writing the additional precision bits for a particular data element. It is a simple matter to designate a region and request only the additional precision packets for that specified region. Thus any number of regions in the MRS(1) screening image can be designated for increased precision without requiring any calculations. All that is required is to read the data, and over-write the additional precision bits as the data are re-stored. Further, that operation can be performed incrementally as MRS(1) to MRS(2), then to MRS(3), etc., up to MRS(N), or directly to MRS(N) or any other precision depending upon the user's particular interests.
While particular embodiments have been chosen to illustrate the invention, it will be understood by those skilled in the art that various changes and modifications can be made therein without departing from the scope of the invention as defined in the appended claims.
This application claims priority to U.S. Provisional Patent Application No. 61/712,641, filed Oct. 11, 2012, the disclosure of which is hereby incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
61712641 | Oct 2012 | US |