Erasing-Based Lossless Compression and Decompression Methods for Floating-Point Data

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priorities from Chinese Patent Application No. 202310068186.3 filed on Jan. 16, 2023 and Chinese Patent Application No. 202310070527.0 filed on Jan. 16, 2023 before the China National Intellectual Property Administration (CNIPA), the entire disclosure of which are incorporated herein by reference in their entity.

TECHNICAL FIELD

Embodiments of the present disclosure relate to the field of computer technology, particularly to the field of lossless data compression method, and more particularly to an erasing-based lossless compression and decompression methods for Floating-point Data.

BACKGROUND

The advance of sensing devices and Internet of Things has brought about the explosion of time series data. A significant portion of time series data are floating-point values produced at an unprecedentedly high rate in a streaming fashion. If these huge floating-point time series data (abbr. time series or time series data in the following) are transmitted and stored in their original format, it would take up a lot of network bandwidth and storage space, which not only causes expensive overhead, but also reduces the system efficiency and further affects the usability of some critical applications. Therefore, when processing or storing floating-point data, it is necessary to first compress the floating-point data according to a certain algorithm while meeting certain accuracy requirements. The compressed floating-point data will occupy less storage space, operation resources and transmission resources.

Normally, there are two categories of compression methods specifically for floating-point time series data, i.e., lossy compression algorithms and lossless compression algorithms. The former would lose some information, and thus it is not suitable for scientific calculation, data management or other critical scenarios, in which any error could result in disastrous consequences. To this end, lossless floating-point time series compression has attracted extensive interest for decades. One representative lossless algorithm is based on the XOR operation.

As shown in FIG. 1, given a time series of double-precision floating-point values, suppose the current value and its previous one are 3.17 and 3.25, respectively. If not compressed, each value will occupy 64 bits in its underlying storage. When compressing, the XOR-based compression algorithm performs an XOR operation on 3.17 and 3.25, i.e., Δ=3.17 ⊕3.25. When decompressing, it recovers 3.17 through another XOR operation, i.e., 3.17=Δ⊕3.25. Because two consecutive values in a time series tend to be similar, the underlying representation of A is supposed to contain many leading zeros (and maybe many trailing zeros). Therefore, we can record A by storing the center bits along with the numbers of leading zeros and trailing zeros, which usually takes up less than 64 bits. Thus, floating-point values are compressed by omitting directly storing of the leading zeros and trailing zeros.

Gorilla (see Pelkonen, T., Franklin, S., Teller, J., Cavallaro, P., Huang, Q., Meza, J., Veeraraghavan, K.: Gorilla: A fast, scalable, in-memory time series database. Proceedings of the VLDB Endowment 8(12), 18161827 (2015)) and Chimp (see Liakos, P., Papakonstantinopoulou, K., Kotidis, Y.: Chimp: efficient lossless floating point compression for time series databases. Proceedings of the VLDB Endowment 15(11), 3058{3070(2022)} are two state-of-the-art XOR-based lossless floating-point compression methods. Gorilla assumes that the XORed result of two consecutive floating-point values is likely to have both many leading zeros and trailing zeros. However, the XORed result actually has very few trailing zeros in most cases. As shown in FIG. 2, if we perform an XOR operation on each value with its previous one (just as Gorilla and Chimp did), there are as many as 95% XORed results containing no more than 5 trailing zeros. Instead of using the exactly previous one value, the Chimp work proposes Chimp128 which selects from the previous 128 values the one that produces an XORed result with the most trailing zeros. However, as shown in FIG. 2, when we investigate the trailing zeros' distribution of the XORed results produced by Chimp128, there are still up to 60% of them having no more than 5 trailing zeros.

However, increasing the number of trailing zeros of the XORed results plays a significant role in improving the compression ratio for time series.

SUMMARY

Embodiments of the present disclosure propose an erasing-based lossless compression method for floating-point values, an erasing-based lossless decompression method for floating-point values, an electronic device, and a non-transitory computer readable storage medium.

In a first aspect, some embodiments of the present disclosure provide an erasing-based lossless compression method for floating-point values. The method includes: acquiring a floating-point value, and calculating a decimal place count of the floating-point value; transforming the floating-point value into a binary format, where the floating-point value in the binary format is composed of a digit on a sign bit, digits on exponent bits, and digits on mantissa bits; determining, in the mantissa bits, a reference mantissa bit based on the decimal place count and the digits on the exponent bits; performing erasing operation on bits following the reference mantissa bit by setting corresponding digits on the bits following the reference mantissa bit to be zero, to obtain a value in the binary format, and using the value in the binary format obtained by the erasing operation as a mantissa prefix number of the floating-point value; inputting the mantissa prefix number of the floating-point value into an eXclusive OR (XOR) based compressor, to obtain an XORed result, and storing the XORed result.

In a second aspect, some embodiments of the present disclosure provide an erasing-based lossless decompression method for floating-point values. The method includes: acquiring an XORed result and a modified decimal significand count of a floating-point value, where the XORed result is obtained during compression of the floating-point value by performing XOR operation on a mantissa prefix number of the floating-point value and a mantissa prefix number of a previous floating-point value; performing XOR operation on the XORed result and the mantissa prefix number of the previous floating-point value, to obtain a mantissa prefix number of the floating-point value; calculating a decimal place count of the floating-point value based on the modified decimal significand count of a floating-point value; and recovering the floating-point value based on the mantissa prefix number of the floating-point value and the decimal place count of the floating-point value.

In a third aspect, some embodiments of the present disclosure provide an electronic device. The electronic device includes at least one processor; and a memory communicatively connected to the at least one processor; where, the memory stores instructions executable by the at least one processor, and the instructions, when executed by the at least one processor, cause the at least one processor to perform the method described in any one of the embodiments described in the first and second aspects.

In a fourth aspect, some embodiments of the present disclosure provide a non-transitory computer readable storage medium, storing computer instructions thereon, where the computer instructions are used to cause the computer to perform the method described in any one of the embodiments described in the first and second aspects.

BRIEF DESCRIPTION OF THE DRAWINGS

After reading detailed descriptions of non-limiting embodiments given with reference to the following accompanying drawings, other features, objectives and advantages of the present disclosure will be more apparent:

FIG. 1 is an example of a XOR-based compression method;

FIG. 2 is the distribution of trailing zero's count in existing XOR-based compression methods;

FIG. 3 is a flowchart of an erasing-based lossless compression method for floating-point values according to an embodiment of the present disclosure;

FIG. 4 is the data structure of double-precision floating-point format;

FIG. 5 is a flowchart of an erasing-based lossless decompression method for floating-point values according to an embodiment of the present disclosure;

FIG. 6 is an example for explaining the intuition of the Erasing-based Lossless Floating-Point (Elf) eraser;

FIG. 7 is a schematic diagram of an application scenario of the Elf compression and decompression methods according to an embodiment of the present disclosure;

FIG. 8 illustrates examples of mantissa prefix number;

FIG. 9a is schematic diagram of an encoding strategy for the significand count according to an embodiment of the present disclosure;

FIG. 9b is schematic diagram of an encoding strategy for the significand count according to another embodiment of the present disclosure;

FIG. 9c is schematic diagram of an encoding strategy for the significand count according to yet another embodiment of the present disclosure;

FIG. 10a is schematic diagram of an encoding strategy for the XORed result according to an embodiment of the present disclosure;

FIG. 10b is schematic diagram of an encoding strategy for the XORed result according to another embodiment of the present disclosure;

FIG. 10c is schematic diagram of an encoding strategy for the XORed result according to yet another embodiment of the present disclosure;

FIG. 11 is a schematic structural diagram of an erasing-based lossless compression apparatus for floating-point values according to an embodiment of the present disclosure;

FIG. 12 is a schematic structural diagram of an erasing-based lossless decompression apparatus for floating-point values according to an embodiment of the present disclosure.

DETAILED DESCRIPTION OF EMBODIMENTS

Embodiments of the present disclosure is further described below in detail in combination with the accompanying drawings. It may be appreciated that the specific embodiments described herein are merely used for explaining the relevant disclosure, rather than limiting the disclosure. In addition, it should be noted that, for the ease of description, only the parts related to the relevant disclosure are shown in the accompanying drawings.

It should be noted that, in the specification, the expressions such as “first,” “second” and “third” are only used to distinguish one feature from another, rather than represent any limitations to the features. It should be further understood that the terms “comprise,” “comprising,” “having,” “include” and/or “including,” when used in the specification, specify the presence of stated features, elements and/or components, but do not exclude the presence or addition of one or more other features, elements, components and/or combinations thereof. In addition, expressions such as “at least one of,” when preceding a list of listed features, modify the entire list of features rather than an individual element in the list. Further, the use of “may,” when describing the implementations of the present disclosure, relates to “one or more implementations of the present disclosure”. Also, the term “exemplary” is intended to refer to an example or illustration.

It should be noted that embodiments in the present disclosure and the features in the embodiments may be combined with each other on a non-conflict basis. Embodiments of the present disclosure will be described below in detail with reference to the accompanying drawings.

FIG. 3 illustrates the flowchart of an erasing-based lossless compression method for floating-point values according to an embodiment of the present disclosure. The erasing-based lossless compression method for floating-point values comprises following steps.

Step 1: acquiring a floating-point value, and calculating a decimal place count of the floating-point value.

In the embodiment, the floating point value may be a single-precision floating-point value or a double-precision floating point value. The decimal place count of the floating-point value is the count of decimal place(s) in the floating-point value in the decimal format. For example, for a floating-point value 3.17, the decimal place count thereof is 2. For another example, for a floating-point value −0.0314, the decimal place count thereof is 4. For another example, for a floating-point value 314.0, the decimal place count thereof is 1.

S2: transforming the floating-point value into a binary format, wherein the floating-point value in the binary format is composed of a digit on a sign bit, digits on exponent bits, and digits on mantissa bits.

In the embodiment, the floating-point value is transformed from the decimal format into the binary format. The floating-point value in the binary format may be a double floating-point value occupying 64 bits, which include 1 sign bit, 11 exponent bits, and 52 mantissa bits, just as illustrated in FIG. 4. Alternatively, the floating-point value in the binary format may be a single floating-point value occupying 32 bits, which include 1 sign bit, 8 exponent bits, and 23 mantissa bits. The floating-point value in the binary format is composed of a digit on the sign bit, digits on the exponent bits, and digits on the mantissa bits. For example, for the floating-point value 3.17, it may be transformed into its binary format, i.e., “0 10000000000 1001010111000010100011110101110000101000111101011100”, then the binary format of the floating-point value 3.17 is composed of one digit “0” on the sign bit, 11 digits “10000000000” on the exponent bits, and 52 digits “1001010111000010100011110101110000101000111101011100” on the mantissa bits.

S3: determining, in the mantissa bits, a reference mantissa bit based on the decimal place count and the digits on the exponent bits.

In the embodiment, a reference mantissa bit (the mantissa bits after this reference mantissa bit will be erased) is determined in the mantissa bits of the floating-point value, based on the decimal place count and the digits on the exponent bits.

In an alternative implementation of the embodiment, the floating-point value in the binary format may be a floating-point value occupying 64 bits, and then the reference mantissa bit may be determined by:

$g (α) = ⌈ α \times \log_{2} 10 ⌉ + e - 1023$

$e = {(e_{1} e_{2} ... e_{11})}_{2} = \sum_{i = 1}^{11} e^{i} \times 2^{11 - i}$

where α denotes the decimal place count of the floating-point value, g(α) denotes the place of the reference mantissa bit in the mantissa bits of the floating-point value, and e_idenotes a digit on the i^thexponent bit in the exponent bits of the floating-point value. The operator ┌x┐ means round x up. That is, the digit on the reference mantissa bit is m_g(α), then the digits <m_g(α)+1, . . . , m₅₂> on the mantissa bits after the reference mantissa bit g(α) are set to be zero, and thus the mantissa bits after the reference mantissa bit g(α) are erased.

In an alternative implementation of the embodiment, the floating-point value in the binary format may be a floating-point value occupying 32 bits, and then the reference mantissa bit is determined by:

$g (α) = ⌈ α \times \log_{2} 10 ⌉ + e - 127$

$e = {(e_{1} e_{2} ... e_{8})}_{2} = \sum_{i = 1}^{8} e^{i} \times 2^{8 - i}$

where α denotes the decimal place count of the floating-point value, g(α) denotes the place of the reference mantissa bit in the mantissa bits of the floating-point value, and e_idenotes a digit on the i^thexponent bit in the exponent bits of the floating-point value. Therefore, the digit on the reference mantissa bit is m_g(α), then the digits <m_g(α)+1, . . . , m₂₃> on the mantissa bits after the reference mantissa bit g(α) are set to be zero, and thus the mantissa bits after the reference mantissa bit g(α) are erased.

S4: performing erasing operation on bits following the reference mantissa bit by setting corresponding digits on the bits following the reference mantissa bit to be zero, to obtain a value in the binary format, and using the value in the binary format obtained by the erasing operation as a mantissa prefix number of the floating-point value.

In the embodiment, the bits following the reference mantissa bit g(α) are erased by setting corresponding digits on the bits following the reference mantissa bit g(α) to be zero. A value in the binary format is obtained by the erasing. The value in the binary format obtained by the erasing is used as mantissa prefix number of the floating-point value. For example, given the float-point value 3.17, the decimal place count thereof is α=2, the float-point value 3.17 is transformed into binary format, i.e., “0 10000000000 1001010111000010100011110101110000101000111101011100”, then it may be calculated that e=(e₁e₂. . . e₁₁)₂=Σ_i=1¹¹eⁱ×2¹¹⁻ⁱ=1×2¹⁰=1024, g(α)=[α×log₂10]+e−1023=8, it indicates that the 8^thmantissa bit is determined as the reference mantissa bit, and then mantissa bits after the 8^thmantissa bit are erased from the binary format of the floating value 3.17 to obtain a value “0 10000000000 1001010100000000000000000000000000000000000000000000”. The value “0 10000000000 1001010100000000000000000000000000000000000000000000” may be used as the mantissa prefix number of the floating-point value 3.17.

S5: inputting the mantissa prefix number of the floating-point value into an eXclusive OR (XOR) based compressor, to obtain an XORed result, and storing the XORed result.

In the embodiment, the mantissa prefix number of the floating-point value is inputted in to an eXclusive OR (XOR) based compressor, to obtain an XORed result. In an alternative implementation, the mantissa prefix number of the floating-point value may be inputted in to an eXclusive OR (XOR) based compressor, to perform XOR operation on the mantissa prefix number of the floating-point value and a mantissa prefix number of a previous floating-point value, to obtain the XORed result. For example, as illustrated in FIG. 6, the floating-point value may be 3.17 and the previous floating-point value may be 3.25, then the mantissa prefix numbers thereof are “0 10000000000 1001010100000000000000000000000000000000000000000000” and “0 10000000000 1010000000000000000000000000000000000000000000000000” respectively, then XOR operation is performed on the mantissa prefix numbers to obtain an XORed result “0 00000000000 0011010100000000000000000000000000000000000000000000” (i.e., denoted by Δ′). A lot of leading zeros and tailing zeros are produced. The XORed result is stored, for recovering the original floating-point value later. For example, the XORed result is stored by recording the center bits and the numbers of leading zeros and the trailing zeros.

The compression method transforms a floating-point value to another one with more trailing zeros under a guaranteed bound, so it can potentially improve the compression ratio of most XOR-compression methods tremendously.

In an alternative implementation, the decimal place count of the floating-point value may be also stored, so that the XORed result and the decimal place count of the floating-point value form the lossless compressed data for recovering the floating-point value.

In an alternative implementation, a modified decimal significand count of the floating-point value may be calculated and then also be stored, so that the XORed result and the modified decimal significand count of the floating-point value form the lossless compressed data for recovering the floating-point value. The modified decimal significand count of the floating-point value may be used for later recovering the decimal place count of the floating-point value.

In an alternative implementation, the modified decimal significand count of the floating-point value may be calculated by:

$β^{*} = {DS}^{*} (v) = {\begin{matrix} 0 & v = 1 0^{- i}, i > 0 \\ β & others \end{matrix}$

where v denotes the floating-point value, β* denotes the modified decimal significand count of the floating-point value, and β denotes a decimal significand count of the floating-point value. Decimal significand count of a floating-point value refers to the count of significand place(s) in decimal format, e.g., the decimal significand count of 3.17 is 3, the decimal significand count of −0.0314 is 3, and the decimal significand count of 3.140 is 4. For example, for the floating point value 3.17, the modified decimal significand count thereof is equal to the decimal significand count thereof, which is 3. Since the decimal significand count β of a double value would not be greater than 17, it requires much fewer bits to store β.

According to the embodiment of the present disclosure, a reference mantissa bit in the mantissa bits is determined based on the decimal place count and the digits on the exponent bits, then bits following the reference mantissa bit are erased (i.e., corresponding digits on the bits following the reference mantissa bit are set to be zero), and the erased floating-point value is input into the XOR-based compressor for XOR operation. At one hand, by erasing the mantissa bits following the reference mantissa bit, plenty tailing mantissa bits are set to be zero, so that an XORed result having plenty tailing zeros are obtained when XOR operation is performed on the erased floating-point value and its neighbor floating-point value. At another hand, the reference mantissa bit (the mantissa bits after which are erased) is determined based on the decimal place count and the digits on the exponent bits, and then only the mantissa bits following reference mantissa bit will be erased, and none of the sign bit and the exponent bits is erased, so that while ensuring that the XORed result has plenty tailing zeros, the compression-decompression precision are ensured, so that the effect of compressing floating-point values are improved. A new idea for compressing floating-point values without any precision loss is provided. In addition, embodiments of the present disclosure use the XORed result and the decimal place count of the floating-point value to form the lossless compressed data for recovering the floating-point value, or uses the XORed result and the modified decimal significand count of the floating-point value to form the lossless compressed data for recovering the floating-point value, so that during the later decompression, the XORed result is decompressed to obtain the mantissa prefix number, and the original floating-point value is recovered based on the mantissa prefix number and the decimal place count (the decimal place count is obtained from storage or is recovered from the stored modified decimal significand count), so that the decompression ratio and the decompression efficiency are further improved.

FIG. 5 illustrates the flow chart of an erasing-based lossless decompression method for floating-point values according to an embodiment of the present disclosure. The decompression method described herein may be based on the data compressed and stored according to the compression method described above. The erasing-based lossless compression method for floating-point values comprises following steps.

Step 1: acquiring the stored XORed result and the modified decimal significand count of a floating-point value; and Step 2: performing XOR operation on the XORed result and the mantissa prefix number of the previous floating-point value, to obtain the mantissa prefix number of the floating-point value.

In the embodiment, the stored XORed result and the modified decimal significand count of the floating-point value is obtained, from a storage where the XORed result and the modified decimal significand count of the floating-point value are stored. Then XOR operation is performed on the XORed result and the mantissa prefix number of the previous floating-point value. For example, the stored XORed result Δ′ is “0 00000000000 0011010100000000000000000000000000000000000000000000”, and the mantissa prefix number of the previous floating-point value 3.25 is “0 10000000000 1010000000000000000000000000000000000000000000000000”, then the XOR operation is performed on the “0 00000000000 0011010100000000000000000000000000000000000000000000” and “0 10000000000 1010000000000000000000000000000000000000000000000000”, to obtain “0 10000000000 1001010100000000000000000000000000000000000000000000” which is the mantissa prefix number of the original floating-point value 3.17.

Step 3: obtaining the decimal place count of the floating-point value.

In the embodiment, the decimal place count of the floating-point value is obtained, for recovering the original floating-point value.

In an alternative implementation of the embodiment, the decimal place count of the floating-point value may be obtained directly when the decimal place count was stored during the compression. Alternatively, the decimal place count of the floating-point value may be recovered from the modified decimal significand count which was stored during the compression.

In an alternative implementation of the embodiment, the recovering the decimal place count based on the decimal significand count may comprises: in response to determining that β* equals to zero, determining that v=10⁻ⁱ, i=SP(v′)+1; in response to determining that β* does not equal to zero, then assigning β=β*, recovering the decimal place count of the floating-point value by,

$α = {\begin{matrix} β - (SP (v^{'}) + 1) & v \neq 1 0^{- i}, i > 0 \\ β - (SP (v^{'}) + 2) & v = 1 0^{- i}, i > 0 \end{matrix},$

where α denotes the decimal place count of the floating-point value, v denotes the floating-point value, v′ denotes the mantissa prefix number of the floating-point value v, SP(v′) is start decimal significand position of the mantissa prefix number. In an alternative implementation, SP(v′)=└log₁₀|v′|┘, the operator[x] denotes round x down. For example, for the original floating-point value 3.17, the stored modified decimal significand count is β=β*=3, and the mantissa prefix number v′ thereof is calculated as 3.1640625, then the decimal place count of the original floating-point value is then calculated as α=β−(SP(v′)+1)=3−(└log₁₀|3.1640625|┘+1)=2. Then, the decimal place count of the original floating-point value is recovered.

Step 3: recovering the floating-point value based on the mantissa prefix number of the floating-point value and the decimal place count of the floating-point value.

In the embodiment, the original floating-point value may be recovered based on the mantissa prefix number of the floating-point value and the decimal place count of the floating-point value.

In an alternative implementation of the embodiment, step 3 further comprises: transforming the mantissa prefix number of the floating-point value into decimal format; recovering the floating-point value by:

$v = Leaveout (v^{'}, α) + 1 0^{- α}$

where Leaveout(v′, α)=(d_h′−1d_h′−2. . . d₀·d₋₁d₋₂. . . d_−α)₁₀is the operation that leaves out the digits after d_−α DF(v′)=(d_h′−1d_h′−2. . . d₀·d₋₁d₋₂. . . d_−αd_−(α+1). . . d_t′)₁₀, where v denotes the floating-point value, v′ denotes the mantissa prefix number of the floating-point value, DF(v′) is the mantissa prefix number in the decimal format, d_idenotes a digit on the i^thplace in the mantissa prefix number in the decimal format.

For example, for the mantissa prefix number “0 10000000000 1001010100000000000000000000000000000000000000000000” of the original floating-point value 3.17, transforming it into the decimal format to obtain a value of 3.1640625, the decimal place count of the original floating-point value is recovered as α=2, then v=Leaveout(v′, a)+10^−α=3.16+10⁻²=3.17. The original floating-point value is recovered without loss of precision.

In an alternative implementation of the embodiment, the equation v=Leaveout(v′, α)+10^−α may be implemented by v=Roundup(v′, α), where Roundup(v′, α) is the operation to round v′ up to a decimal places.

According to the embodiment of the present disclosure, the stored XORed result is acquired and then the mantissa prefix number of the original floating-point value is recovered therefrom, and then the decimal place count of the original floating-point value is recovered, and the original floating-point value is recovered based on the mantissa prefix number and decimal place count of the original floating-point value, without any precision loss. The lossless decompression for the floating-point value is realized.

FIG. 7 is a schematic diagram of an application scenario of the Elf compression and decompression methods according to an embodiment of the present disclosure. As illustrated in FIG. 7, the above described erasing-based lossless compression and decompression methods for floating-point values may be applied to time series data. For the first value v₁, we leverage ┌log₂65┐=7 bits to record the number of trailing zeros trail of v₁′ (note that trail can be assigned a total of 65 values from 0 to 64), and store v₁′'s non-trailing bits with 64-trail bits. In all, we utilize 71-trail bits to record the first value, which is usually less than 64 bits. For each value v_t′ that t>1, we store xor_t⊕v_t′⊕v_t−1′. The storage of the xor_t=v_t′⊕v_t−1′ will be described in detail in following embodiments.

Following double floating-point data occupying 64 bits are taken as an example to explain the embodiments of the present disclosure in detail. The processing method for the single floating-point data is similar.

Definitions

For ease of explanation, following definitions are provided.

Definition 1: Decimal Format and Binary Format. The decimal format of a double value v is DF(v)=±(d_h−1d_h−2. . . d₀·d₋₁d₋₂. . . d_t)₁₀, where d_i∈{0,1, . . . ,9} for l≤i≤h−1, d_h−1≠0 unless h=1, and d_t≠0 unless l=−1. That is, DF(v) would not start with “0” except that h=1, and would not end with “0” except that l=−1. Similarly, the binary format of v is BF(v)=±(b_h−1b_h−2. . . b₀·b₋₁b₋₂. . . b_l)₂, where b_j∈{0,1} for l≤j≤h−1. Following relation holds:

$v = \pm \sum_{i = l}^{h - 1} d_{i} \times 1 0^{i} = \pm \sum_{j = \overline{l}}^{\bar{h} - 1} b_{j} \times 2^{j}$

where the “±” (which means “+” or “−”) is the sign of v. If v≥0, “+” is usually omitted. For example, DF(0)=(0.0)₁₀, DF(5.2)=(5.2)₁₀, BF(−3.125)=−(11.001)₂.

Definition 2: Decimal Place Count, Decimal Significand Count and Start Decimal Significand Position. Given v with its decimal format DF(v)=±(d_h−1d_h−2. . . d₀·d₋₁d₋₂. . . d_l)₁₀, DP(v)=|l| is called its decimal place count. If for all l<n≤i≤h−1, d_i=0 but d_n−1≠0 (i.e., d_n−1is the first digit that is not equal to 0), SP(v)=n−1 is called the start decimal significand position, and DS(v)=n−1=SP(v)+1−l is called the decimal significand count. For the case of v=0, we let DS(v)=0 and SP(v)=undefined.

For example, DP(3.14)=2, DS(3.14)=3, and SP(3.14)=0; DP(−0.0314)=4, DS(−0.0314)=3, and SP(−0.0314)=−2; DP(314.0)=1, DS(314.0)=4, and SP(314.0)=2.

As illustrated in FIG. 4, In accordance with IEEE 754 Standard, a double value v is stored with 64 binary bits, where 1 bit is for the sign s, 11 bits for the exponent {right arrow over (e)}=<e₁, e₂, . . . , e₁₁>, and 52 bits for the mantissa {right arrow over (m)}=<m₁, m₂, . . . , m₅₂>. Normal numbers are the most cases of time series data. If v is a normal number, its value satisfies:

$v = {(- 1)}^{s} \times 2^{e - 1023} \times (1 + \sum_{i = 1}^{52} m_{i} \times 2^{- i})$

$e = {(e_{1} e_{2} \dots e_{11})}_{2} = \sum_{i = 1}^{11} e^{i} \times 2^{11 - i}$

where v denotes the floating-point value, m_idenotes a digit on the i^thmantissa bit in the mantissa bits of the floating-point value, and e_idenotes a digit on the i^thexponent bit in the exponent bits of the floating-point value. If let m₀=1 and BF(v)=±(b_h−1b_h−2. . . b₀·b₋₁b₋₂. . . b_l)₂, we have b_−i=m_i+e−1023, i>0.

As illustrated in FIG. 4, in the mantissa {right arrow over (m)}=<m₁, m₂, . . . , m₅₂> of a double value v, m_iis more significant than m_jfor 1≤i≤j≤52, since m_icontributes more to the value v than m_j.

The main idea of the Erasing-based Lossless Floating-point (Elf) compression described herein is to erase some less significant mantissa bits (i.e., set them to zeros) of a double value v. As a result, v itself and the XORed result of v with its previous value are expected to have many trailing zeros. Note that v and its opposite number—v have the same double-precision floating-point formats except the different values of their signs. That is to say, the compression process for—v can be converted into the one for v if we reverse its sign bit only, and vice versa. To this end, in the rest of the disclosure, if not specified, v is assumed to be positive for the convenience of description. Before introducing the details of Elf compression, we first give the definition of mantissa prefix number.

Definition 3: Mantissa Prefix Number. Given a double value v with {right arrow over (m)}=<m₁, m₂, . . . , m₅₂>, the double value v′ with {right arrow over (m′)}=<m′₁, m′₂, . . . , m′₅₂> is called the mantissa prefix number of v if and only if there exists a number n∈{1,2, . . . ,51} such that m′_i=m_ifor 1≤i≤n and m′_j=0 for n+1≤j≤52, denoted as v′=MPN(v, n).

The definition of Mantissa Prefix Number is proposed firstly in embodiments of the present disclosure.

For example, as shown in FIG. 8, four mantissa prefix numbers of 3.17 are given as examples, i.e., 3.17=MPN(3.17,50), 3.169999837875366=MPN(3.17,23), 3.1640625=MPN(3.17,8), 3.125=MPN(3.17,4).

Observation:

The erasing-based lossless compression method for floating-point values described in embodiments of the present disclosure is based on the following observation: given a double value v with its decimal format DF(v)=(d_h−1d_h−2. . . d₀·d₋₁d₋₂. . . d_l)₁₀, we can find one of its mantissa prefix numbers v′ and a minor double value δ, 0≤δ≤10^l, such that v′=v−δ. If the information of v′ and 6 are retained, v cloud be recovered without losing any precision. The parameter δ is proposed herein for ease of understanding the compression and decompression methods described herein, and the accurate value of δ is not required to be calculated. Then during recovering the original floating-point value v, it is not required to find the accurate value of δ, we just need to round v′ up to a decimal places and then plus 10^−α. For example, when α=DP(v)=DP(3.17)=2, v′=3.1640625, then v=RoundUp(v′, a)=(3.16)₁₀+10⁻²=3.17. In an example, the v=RoundPp(v′, α) could also be implemented by Leaveout(v′, a)=(d′_h−1d′_h−2. . . d₀·d₋₁d₋₂. . . d_−a)₁₀, which leaves out the digits after d_−α in DF(v′)=(d′_h−1d′_h−2. . . d₀·d₋₁d₋₂. . . d_−αd_−(α+1). . . d_l′)₁₀.

FIG. 8 illustrates examples of mantissa prefix number. From FIG. 8 it is noted that 3.1640625 has more trailing zeros than 3.169999837875366 and 3.17, the mantissa prefix number 3.1640625 is the most suitable v′.

There are two problems here. Problem I: how to find the best mantissa prefix number v′ of v with the minimum efforts; Problem II: how to store the decimal place count α with the minimum storage cost?

Mantissa Prefix Number Search:

For the problem I: It is time consuming to iteratively check all mantissa prefix number v′ until δ=v−v′ is greater than 10^−α, it needs to verify the mantissa prefix numbers at most 52 times in the worst case. A novel mantissa prefix number search method is proposed herein.

Theorems are proposed herein for ease of explaining the mantissa prefix number search method.

Theorem 1: Given a double value v with its decimal place count DP(v)=a and binary format BF(v)=(b_h−1b_h−2. . . b₀·b₋₁b₋₂. . . b_l)₂,δ=(0.0 . . . 0b_−(f(α)+1)b_−(f(α)+2). . . b_l)₂is smaller than 10^−α, where f(α)=┌log₂10^−α|┐=┌|α×log₂10|┐.

Proof:

$\begin{matrix} δ = \sum_{i = f (α) + 1}^{❘ \overline{l} ❘} b_{i} \times 2^{- i} \leq \sum_{i = f (α) + 1}^{❘ \overline{l} ❘} 2^{- i} < \sum_{i = f (α) + 1}^{+ \infty} 2^{- i} \\ = 2^{- f (α)} = 2^{- ⌈ α \times \log_{2} 10 ⌉} \leq 2^{- α \times \log_{2} 10} \\ = {(2^{\log_{2} 10})}^{- α} = 10^{- α} \end{matrix}$

Here, f(α)=┌|log₂10^−α|┐ means that the decimal value 10^−α requires exactly ┌|log₂10^−α|┐ binary bits to represent. Suppose δ is obtained based on Theorem 1, v−δ can be regarded as erasing the bits after b_−f(α)in Vs binary format. In accordance with IEEE 754 Standard and recall that the b_−i=m_i+e−1023in BF(v) where i>0 described above, a correspondingm_i+e−1023can be found. Consequently, v−δ can be further deemed as erasing the mantissa bits after m_g(α)in Vs underlying floating-point format, in which g(α) is defined as:

$g (α) = f (α) + e - 1 0 2 3 = ⌈ α \times \log_{2} 1 0 ⌉ + e - 1 0 2 3$

where α=DP(v) and e=(e₁e₂. . . e₁₁)₂=Σ_i=1¹¹e_i×2¹¹⁻ⁱ.

As a result, we can directly calculate the best mantissa prefix number v′ by simply erasing the mantissa bits after m_g(α)of v, which takes only O(1).

Decimal Place Count Calculation:

For the problem II, if it is directly the decimal place count a stored, it would require ┌log₂α_max┐ bit for a storage, where α_maxis the possible maximum value of a decimal place count. The minimum value of the double-precision floating-point number is about 4.9×10⁻³¹⁴, so α_max=324 and ┌log₂α_max┐=9, i.e., it would require as many as 9 bits to store α during compression process for each double value. Thus, to further reduce the storage cost and improve the compression ratio, we may store the modified decimal significand count of the floating-point value instead.

Given v with its decimal format DF(v)=(d_h−1d_h−2. . . d₀·d₋₁d₋₂. . . d_l)₁₀, we notice that its decimal place count α=DP(v) can be calculated by the decimal significand count β=DS(v). Since the decimal significand count of a double value would not be greater than 17 under the IEEE 754 Standard, it requires much fewer bits to store β. According to the above Definition 2, we have α=DP(v)=|l|=−l and β=DS(v)=SP(v)+1−l, so we have:

$α = β - (SP (v) + 1)$

Next, we discuss how to get SP(v) without even knowing v. Two additional Theorems are proposed. The additional theorems are proposed according to the structure of double floating-point value.

Theorem 2: Given a double value v and its best mantissa prefix number v′, if v≠10⁻ⁱ, i>0, then SP(v)=SP(v′).

Theorem 3: Given a double value v=10⁻ⁱ, i>0, and its best mantissa prefix number v′, we have SP(v)=SP(v′)+1.

According to Theorem 2 and Theorem 3, we have:

$α = {\begin{matrix} β - (SP (v^{'}) + 1) & v \neq 10^{- i}, i > 0 \\ β - (SP (v^{'}) + 2) & v = 10^{- i}, i > 0 \end{matrix}$

For any normal number v, its decimal significand count β will not be zero. Besides, if we know v=10^−SP(v), we can easily get v from v′ by the following equation:

$v = 1 0^{- (SP (v^{'}) + 1)}$

To this end, we can record a modified decimal significand count β* for the calculation of α.

$β^{*} = {DS}^{*} (v) = {\begin{matrix} 0 & v = 1 0^{- i}, i > 0 \\ β & others \end{matrix}$

where β* denotes the modified decimal significand count of the floating-point value, and β denotes a decimal significand count of the floating-point value, SP(v′) denotes the start decimal significand position of v′.

Although there are 18 possible values of β*, i.e., β*∈{0, 1, 2, . . . , 17}, we do not consider the situations when β*=16 or 17, because for these two situations, we can only erase a small number of bits but need more bits to record β*. For example, given v=3.141592653589792 with β=16, we can erase one bit only. Thus, the erasing operation may be performed when it determined that β*<16.

In an alternative implementation, since 4 bits is leveraged to record β* for 0≤β*<15, the erasing operation is performed only when 52−g(α)>4. When 52−g(α)≤4, which means the mantissa bits to be erased is less than 4, we may do not perform the earing operation.

In an alternative implementation, when δ=0, it indicates that v itself has long trailing zeros. Once δ=0, we may do not perform the erasing operation. We may get δ by extracting the least 52−g(α) significant mantissa bits of v, to determine if δ=0.

Implementations of present disclosure store the modified decimal significand count β* instead of the decimal place count, the storage space required is reduced hugely compared with directly storing the decimal place count.

Normal Numbers and Special Numbers:

Normal numbers are the most cases of time series data, and the erasing operation in the above described compression and decompression methods are applicable to normal numbers. However, the erasing operation described above is tailored for the special numbers.

There are four types of special number:

- (1) Zero. The digits on the exponent bits and the digits on the mantissa bits are all “0”.
- (2) Infinity. The digits on the exponent bits are all “1” and the digits on the mantissa bits are all “0”.
- (3) NaN. The digits on the exponent bits are all “1” and the digits on the mantissa bits includes “0” and “1”.
- (4) Subnormal Number. The digits on the exponent bits are all “0” and the digits on the mantissa bits includes “0” and “1”. In this case, the following equation holds:

$ \begin{matrix} v = {(- 1)}^{s} \times 2^{- 1022} \times {(0 ? m_{1} m_{2} \dots m_{52})}_{2} \\ = {(- 1)}^{s} \times 2^{- 1022} \times \sum_{i = 1}^{52} m_{i} \times 2^{- i} \end{matrix}$

$? indicates text missing or illegible when filed$

The above erasing operation are tailored for the special numbers by:

- (1) for Zero and Infinity: if v is a zero or infinity, erasing operation will not be performed on v because all its mantissa bits are already zero.
- (2) for NaN: if v is a NaN, in order to make its trailing zeros as many as possible, we perform NaN_normoperation on it, which sets m₁=1 and m_i=0 for i∈{2,3, . . . ,52}, i.e.,

$v^{'} = {NaN}_{norm} (v) = 0 ? ff 8000000000000 L & v .$

$? indicates text missing or illegible when filed$

- (3) for Subnormal Number: subnormal numbers can be regarded as the special cases of normal numbers by setting e=1 and m₀=0. As a result, the subnormal numbers can be compressed in the same way of normal numbers.

Significand Count Encoding Strategy:

According to yet another embodiment of the present disclose, a method for storing the modified decimal significand count of the floating-point value is provided. The method for storing the modified decimal significand count of the floating-point value includes: in response to determining that the condition C₁is satisfied, writing a first flag code (e.g., one bit of “1”) to indicate performing the erasing operation, and writing 4 bits of β* following the first flag code; in response to determining that the condition C₁is not satisfied, writing a second flag code (e.g., one bit of “0”) to indicate not performing the erasing operation. The condition C₁is satisfied when δ≠0 (i.e., a digit on a mantissa bit following the reference mantissa bit is not zero) and/or β*<16, and/or 52−g(α)>4. For example, the condition C₁is satisfied when it is determined that δ≠0. For example, the condition C₁is satisfied when it is determined that δ≠0 and β*<16. For example, the condition C₁is satisfied when it is determined that δ≠0 and 52−g(α)>4. For another example, the condition C₁is satisfied when it is determined that δ≠0 and β*<16 and 52−g(α)>4. An alternative implementation of storing the modified decimal significand count β* is described in FIG. 9a. When the erasing operation is performed in response to δ≠0 and 52−g(α)>4, a positive gain on compression ratio may be ensured while ensuring the lossless compression on floating-point value. When the erasing operation is performed in response to δ≠0 and β*<16, a positive gain on compression ratio may be ensured while ensuring the lossless compression on floating-point value.

In an alternative implementation, given a floating-point value v, when it is determined that the above condition C₁is satisfied, the out stream writes a first flag code (e.g., one bit of “1”) to indicate that v should be transformed to v′ by erasing the least 52−g_(α)significant mantissa bits of v, followed by 4 bits of β* for the recovery of v. Otherwise, the out stream writes a second flag code (e.g., one bit of “0”), and v′ is assigned v without any modification. Finally, the obtained v′ is passed to the XOR-based compressor (i.e., the XORcmp illustrated in FIG. 9a) together with the first or second flag code for further compression. The encoding strategy illustrated in FIG. 9a are just alternative examples, any other encoding strategy (such as using the flag code “0” for the case “C₁” and using flag code “1” for the case “Not C₁”, etc.) which can also arrive the same effect could also be used herein.

In an alternative implementation, when it is determined that δ≠0 and β*<16 and 52−g(α)>4 hold simultaneously, the out stream writes a first flag code (e.g., one bit of “1”) to indicate that v should be transformed to v′ by erasing the least 52−g(α) significant mantissa bits of v, followed by 4 bits of β* for the recovery of v. Otherwise, the out stream writes a second flag code (e.g., one bit of “0”), and v′ is assigned v without any modification. Finally, the obtained v′ is passed to the XOR-based compressor together with the first or second flag code for further compression.

The values in a time series usually have similar significand counts. Therefore, their modified significand counts are also similar. In the method described above, if a value v is to be erased, we always use four bits to record its β*, which consumes storage spaces. An embodiment of the present disclosure proposes to make the utmost of the modified significand count of the previous one value β*_pre, which is not only suitable for streaming scenarios and adaptive to dynamic significand counts, but also retains the characteristics of lossless compression. The intuition behind this is that the modified significand count of each value in a time series is likely to be exactly the same as that of the previous value. An alternative implementation of storing the β* by make the utmost of β*_preis described in FIG. 9b.

As illustrated in FIG. 9b, an example of storing the modified decimal significand count of the floating-point value may comprise: in response to determining that the condition C₁is satisfied and β*=β*_pre, writing a third flag code (e.g. one bit of “1” to indicate C₁is satisfied and followed by one more bit of “0” to indicate β*=β*_pre) to indicate performing the erasing operation and that β* is identical to β*_pre; in response to determining that C₁is satisfied and β*≠β*_pre, writing a fourth flag code (e.g. one bit of “1” to indicate that C₁is satisfied and followed by one more bit of “1” to indicate β*≠β*_pre) to indicate performing the erasing operation and that β* is not identical to β*_pre, and writing 4 bits of β* following the flag code; in response to determining that C₁is not satisfied, writing a second flag code (e.g., one bit of “0” to indicate that v′=v) to indicate not performing the erasing operation. The condition C₁herein refers to, for example, the condition C₁is satisfied when it is determined that β*<16 and 52−g(α)>4 and δ≠0. Here, β* denotes the modified decimal significand count of the floating-point value, and β*_predenotes the modified decimal significand count of the previous floating-point value. Finally, the obtained v′ is passed to the XOR-based compressor together with the second, third or fourth flag code for further compression. The encoding strategy illustrated in FIG. 9b are just alternative examples, any other encoding strategy (such as using the flag code “01” for the case “C₁and β*=β*_pre” and using flag code “1” for the case “Not C₁”, and using flag code “00” for the case “C₁and β*≠β*_pre”, etc.) which can also arrive the same effect could also be used herein.

We notice that the case of “C₁and β*=β*_pre” has the largest proportion among the three cases illustrated in FIG. 9b for almost all datasets, but we use 2 bits (e.g., the flag code ‘10’) to represent this case. In order to encode the more frequent cases with fewer bits, we propose herein to switch the flag codes (e.g., ‘10’ and ‘0’) of case “C₁and β*=β*_pre” and case “Not C₁” in FIG. 9(b). Finally, an alternative implantation of storing the modified decimal significand count β* is transformed into the one shown in FIG. 9c.

As illustrated in FIG. 9c, an example of storing the modified decimal significand count β* of the floating-point value may comprise: in response to determining that the condition C₁is satisfied and β*=β*_pre, writing a second flag code (e.g. one bit of ‘0’) to indicate performing the erasing operation and that β* is identical to β*_pre; in response to determining that C₁is satisfied and β*≠β*_pre, writing a fourth flag code (e.g. two bits of ‘11’) to indicate performing the erasing operation and that β* is not identical to β*_pre, and writing 4 bits of β* following the flag code; in response to determining that C₁is not satisfied, writing a third flag code (e.g., two bits of ‘10’) to indicate not performing the erasing operation. The condition C₁is satisfied herein, for example, when it is determined that β*<16 and 52−g(α)>4 and δ≠0. Here, β* denotes the modified decimal significand count of the floating-point value, and β*_predenotes the modified decimal significand count of the previous floating-point value. Finally, the obtained v′ is passed to the XOR-based compressor together with the second, third or fourth flag code for further compression. The encoding strategy illustrated in FIG. 9c are just alternative examples, any other encoding strategy (such as using the flag code “1” for the case “C₁and β*=β*_pre” and using flag code “01” for the case “Not C₁”, and using flag code “10” for the case “C₁and β*≠β*_pre”, etc.) which can also arrive the same effect could also be used herein.

An example algorithm for realizing the Elf+ compression corresponding to FIG. 9c is listed below.

Algorithm : ElfPlusEraser(υ, out)

1
α ← DP(υ), β* ← DS* (υ);

2
δ ← ~(0xffffffffffffffffL << (52 − g(α))) & υ;

3
if β* < 16 and δ ≠ 0 and 52 − g(α) > 4 then

4
|
if β* = β*_prethen

5
|
|_ out.writeBit(“0”);

6
|
else

7
|
|
out.writeBit(“11”); out.write(β*,4);

8
|
|_—
β*_pre← β*;

9
|_—
υ′ ← (0xffffffffffffffffL << (52 − g(α))) & υ;

10
else

11
|_—
out.writeBit(“10”); υ′ ← υ;

12
XOR_cmp(υ′,out);

The above algorithm presents Elf+ compression method, which is similar to the Elf compression method except two aspects. (1) We further check if β*=β*_prewhen v is to be erased (Lines 4-9). If β*=β*_pre, we only write one bit of ‘0’. Otherwise, we write two bits of ‘11’ and four bits of β*. Moreover, we assign β* to β*_prefor the compression of the next value (Line 8). (2) The flag codes are different from those in Elf compression. For example, in Elf compression, we use one bit of ‘0’ to indicate the case that v would not be erased, but in Elf+ compression we leverage two bits of ‘10’ for this case (Line 11).

Here, each of the first, second, third, and fourth flag codes may occupy one or two bits.

When β* is stored according to the encoding strategy illustrated in FIG. 9a, an alternative implementation of recovering the original floating-point value may include: reading a flag code and determining whether the flag code is the first or second flag code; in response to determining that the flag code is the second flag code (e.g., one bit of ‘0’), assigning v=v′; in response to determining that the flag code is the first flag code (e.g., one bit of ‘1’), getting the β* by reading the 4 bits of β*; performing XOR operation on the XORed result and the mantissa prefix number of the previous floating-point value, to obtain the mantissa prefix number v′ of the floating-point value v; calculating the decimal place count of the floating-point value v based on the modified decimal significand count β* of the floating-point value v; and recovering the floating-point value based on the mantissa prefix number of the floating-point value and the decimal place count of the floating-point value.

When β* is stored according to the storing method illustrated in FIG. 9b, an alternative implementation of recovering the original floating-point value may include: reading a flag code and determining whether the flag code is the second, third, or fourth flag code; in response to determining that the flag code is the second flag code (e.g., one bit of ‘0’), assigning v′=v; in response to determining that the flag code is the third flag code (e.g., two bits of ‘10’), assigning β*=β*_pre; in response to determining that the flag code is the fourth flag code (e.g. two bits of ‘11’), getting the β* by reading the 4 bits of β*; performing XOR operation on the XORed result and the mantissa prefix number of the previous floating-point value, to obtain the mantissa prefix number of the floating-point value; calculating the decimal place count of the floating-point value based on the modified decimal significand count of the floating-point value; and recovering the floating-point value based on the mantissa prefix number of the floating-point value and the decimal place count of the floating-point value.

When β* is stored according to the storing method illustrated in FIG. 9c, an alternative implementation of recovering the original floating-point value may include: reading a flag code and determining whether the flag code is the second, third, or fourth flag code; in response to determining that the flag code is the third flag code (e.g., two bits of ‘10’), assigning v′=v; in response to determining that the flag code is the second flag code (e.g. one bit of ‘0’), assigning β*=β*_pre; in response to determining that the flag code is the fourth flag code, getting the β* by reading the 4 bits of β*; performing XOR operation on the XORed result and the mantissa prefix number of the previous floating-point value, to obtain the mantissa prefix number of the floating-point value; calculating the decimal place count of the floating-point value based on the modified decimal significand count of the floating-point value; and recovering the floating-point value based on the mantissa prefix number of the floating-point value and the decimal place count of the floating-point value.

An example algorithm for realizing recovering the original floating-point value corresponding to the Elf+ compression of FIG. 9c is listed below:

Algorithm : ElfPlusRestorer(in)

1
if in.read(1) = 0 then

2
|
β* ← β*_pre; υ′ ← XOR_dcmp(in);

3
|_—
υ ← restore(β*, υ');

4
else if in.read(1) = 0 then

5
|_—
υ ← XOR_dcmp(in);

6
else

7
|
β* ← in.read(4); υ′ ← XOR_dcmp(in);

8
|_—
υ ← restore(β*, υ′); β*_pre← β*;

9
return υ;

10
Function restore(β*, υ′)

11
|
if β* = 0 then

12
|
|_—
υ ← 10^{−(SP(υ′)+1)};

13
|
else

14
|
|
α ← β* − (SP(υ′) + 1);

15
|
|
υ ← LeaveOut(υ′,α) + 10^−α;

|
|_—

16
|_—
return υ;

Significand Count Calculation:

The naive method for calculating the significand counts of floating-point values is to first transform a floating-point value into a string, and then calculate its significand count by scanning the string. However, this method runs very slowly since the data type transformation is quite expensive. Other methods, such as BigDecimal in Java language perform even worse as these high-level classes implement many complex but unnecessary logics, which are not suitable for the calculation of significand counts.

In an alternative implementation, a trial-and-error approach is proposed herein to calculate the significand count. In particular, for any one of the above described compression methods, we iteratively check if the condition “v×10ⁱ=└v×10ⁱ┘” holds (only when the result of v×10ⁱdoes not have the fractional part, does the condition hold), where i is sequentially from sp* to at most sp*+17 (note that the maximum significand count of a double value is 17). Here, sp* is calculated by:

${sp}^{*} = {\begin{matrix} 1 & SP (v) \geq 0 \\ - SP (v) & SP (v) < 0 \end{matrix}$

The value i (denoted as i*) that first makes the equation “v×10ⁱ=└v×10ⁱ┘” hold can be deemed as the decimal place count α. At last, we can get the significand count β=i*+SP(v)+1 according to the equation α=β−(SP(v)+1).

The verification of the condition “v×10ⁱ=└v×10ⁱ┘” is expected to take O(β) in terms of time complexity. To expedite this process, we may take full advantage of the fact that most values in a time series have the same significand count. We may start the verification at i=max (β*_pre−SP(v)−1,1). There are two cases. Case 1: β*≤β*_pre. For this case, if “v×10ⁱ=└v×10ⁱ┘” does not hold, we may repetitively increase i by 1 until the condition is satisfied. Case 2: β*>β*_p. For this case, we should constantly adjust i by decreasing it until the condition “i>1 and v×10ⁱ⁻¹=└v×10ⁱ⁻¹┘” does not hold. Finally, the significand count is obtained and returned according to the equation α=β−(SP(v)+1).

Start Position Calculation:

In an alternative implementation, we may leverage two sorted exponential arrays, i.e., Log Arr1={10⁰, 10¹, . . . , 10ⁱ, . . . } and Log Arr2={10⁰, 10⁻¹, . . . , 10^−j, . . . }, to accelerate the process to find SP(v). Particularly, we sequentially scan these two arrays firstly. If v≥1 and 10ⁱ≤v≤10ⁱ⁺¹then SP(v)=i; if v<1 and 10⁻ⁱ≤v≤10^−(j−1), then SP(v)=−j. In an alternative implementation, we may set |Log Arr1|=|Log Arr2|=10. If v≥10¹⁰or v≤10⁻¹⁰, we may call └log₁₀|v|┘ to get SP(v) finally (i.e., SP(v)=└log₁₀|v|┘). This alternative implementation reduce the time consumed during calculation the start position SP(v).

XORcmp and XORdcmp:

Theoretically, any existing XOR-based compressor such as Gorilla and Chimp mentioned above can be utilized in Elf. Since the erased value v′ tends to contain long trailing zeros, to compress the time series compactly, in this section, we propose a novel XOR-based compressor and the correspond decompressor. In an embodiment, both Elf and Elf+ use the same XORcmp and XORdcmp.

Elf XORcmp: existing XOR-based compressors store the first value v₁′ of a time series using 64 bits. However, after being erased some insignificant mantissa bits, v₁′ tends to have a large number of trailing zeros. As a result, we leverage ┌log₂65┐=7 bits to record the number of trailing zeros trail of v₁′ (note that trail can be assigned a total of 65 values from 0 to 64), and store v₁'s non-trailing bits with 64-trail bits. In all, we may utilize 71-trail bits to record the first value, which is usually less than 64 bits. For each value v_t′ that t>1, we store xor_t=v_t′⊕v_t−1′.

Gorilla Compressor Gorilla compressor checks whether xor_tis equal to 0 or not. If xor_t=0 (i.e., v_t′=v_t−1′), Gorilla writes one bit of “0”, and thus it can save many bits without actually storing v_t′. If xor_t≠0, Gorilla writes one bit of “1” and further checks whether the condition C₁is satisfied. Here C₁is “lead_t≥lead_t−1and “trail_t≥trail_t−1”, meaning that the leading zeros count and trailing zeros count of xor_tare greater than or equal to those of xor_t−1, respectively. If C₁does not hold, after writing a bit of “1”, Gorilla stores the leading zeros count and center bits count with 5 bits and 6 bits respectively, followed by the actual center bits. Otherwise, xor_tshares the information of leading zeros count and center bits count with xor_t−1, which is expected to save some bits.

Leading Code Optimization: Observing that the leading zeros count of an XORed value is rarely more than 30 or less than 8, only log₂8=3 bits may be used to represent up to 24 leading zeros. In particular, 8 exponentially decaying steps (i.e., 0, 8, 12, 16, 18, 20, 22, 24) may be used to approximately represent the leading zeros count. If the actual leading zeros count is between 0 and 7, it can be approximated to be 0; if the actual leading zeros count is between 8 and 11, it can be approximated to be 8; and if the actual leading zeros count is between 12 and 15, it can be approximated to be 12; if the actual leading zeros count is between 16 and 17, it can be approximated to be 16; if the actual leading zeros count is between 18 and 19, it can be approximated to be 18; if the actual leading zeros count is between 20 and 21, it can be approximated to be 20; if the actual leading zeros count is between 22 and 23, it can be approximated to be 22; if the actual leading zeros count is 24, it can be approximated to be 24. The condition of C₁is therefore converted into C₂, i.e., “lead_t=lead_t−1and “trail_ttrail_t−1”. By applying this optimization to Gorilla compressor, we can get a compressor shown in FIG. 10a. The encoding strategy (such as the flag code “0” for the case xor_t=0 and the flag code for the case xor_t≠0) illustrated in FIG. 10a are just alternative examples, any other encoding strategy (such as switching the flag code “0” and “1” for the cases xor_t=0 and xor_t≠0, etc.) which can also arrive the same effect could also be used herein.

Center Code Optimization: both v_t′ and v_t−1′ are supposed to have many trailing zeros, which results in an XORed value with long trailing zeros. Besides, v_t′ would not differentiate much from v_t−1′ in most cases, contributing to long leading zeros in the XORed value. That is, the XORed value tends to have a small number of center bits (usually not more than 16). To this end, if the center bits count is less than or equal to 16, we use only log₂16=4 bits to encode it. Although we need one more flag bit, we can usually save one bit in comparison with the original solution. After optimizing the center code, an example compressor as shown in FIG. 10b is obtained. The encoding strategy illustrated in FIG. 10b are just alternative examples, any other encoding strategy (such as switching the flag code “0” and “1” for the cases xor_t=0 and xor_t≠0, and/or using flag code “1” for the case C₂, using flag code “01” for the case C₃and using flag code “00” for other cases, etc.) which can also arrive the same effect could also be used herein.

Flag Code Reassignment: FIG. 10b shows that we use only 1 flag bit for the case of xor_t=0, but 2 or 3 flag bits for the cases of xor_t≠0. However, since identical consecutive values are not very frequent in floating-point time series, to further improve the compression ratio, we may reassign the flag codes to the four cases. Therefore, each case uses only 2 bits of flag, as illustrated in FIG. 10c.

As illustrated in FIG. 10c, the compressor first checks whether xor_tis equal to 0 or not. If xor_t=0 (i.e., v_t′=v_t−1′), the compressor writes two bits of “01”. If xor_t≠0, the compressor further checks whether C₂is satisfied. If C₂is satisfied, the compressor further writes one bit of “0” followed by the actual center bits. If C₂is not satisfied, the compressor further checks if the number of center bits is greater than 16. If the number of center bits is not greater than 16, the compressor further writes one bit of “0”, and stores the leading zeros count and center bits count with 3 bits and 4 bits respectively, followed by the actual center bits. If the number of center bits is greater than 16, the compressor further writes one bit of “1”, and stores the leading zeros count and center bits count with 3 bits and 6 bits respectively, followed by the actual center bits. The encoding strategy illustrated in FIG. 10c are just alternative examples, any other encoding strategy which can also arrive the same effect could also be used herein.

Experiments:

TABLE 1

Dataset
#Records
β
Time Span

Time
Small β
City-temp (CT)
2,905,887
3
25
years

Series

IR-bio-temp (IR)
380,817,839
3
7
years

Wind-speed (WS)
199,570,396
2
6
years

PM10-dust (PM10)
222,911
3
5
years

Medium
Stocks-UK (SUK)
115,146,731
5
1
year

β
Stock-USA (SUSA)
374,428,996
4
1
year

Stocks-DE (SDE)
45,403,710
6
1
year

Dewpoint-temp
5,413,914
4
3
years

(DT)

Air-pressure (AP)
137,721,453
7
6
years

Basel-wind (BW)
124,079
8
14
years

Basel-temp (BT)
124,079
9
14
years

Bitcoin-price (BP)
2,741
9
1
month

Bird-migration
17,964
7
1
year

(BM)

Large β
Air-sensor (AS)
8,664
17
1
hour

Non
Small β
Food-price (FP)
2,050,638
3
—

Time

Vehicle-charge
3,395
3
—

Series

(VC)

Medium
Blockchain-tr
231,031
5
—

β
(BTR)

SD-bench (SB)
8,927
4
—

City-lat (CLat)
41,001
6
—

City-lon (CLon)
41,001
7
—

Large β
POI-lat (PLat)
424,205
16
—

POI-lon (PLon)
424,205
16
—

Experiments are performed to verify the performance of the above described erasing-based lossless compression method for floating-point values and the erasing-based lossless decompression method for floating-point values.

1. Datasets: 22 datasets including 14 time series and 8 non time series, which are further divided into three categories respectively according to their average decimal significand counts as described in the above Table 1.

Baselines: we compare Elf compression method algorithm with 9 existing compression methods. The erasing based lossless compression method for floating-point values as described in the embodiments above is denoted as Elf, and the one that further adopts the significand count optimization and start position optimization is denoted as Elf+.

Metrics: We verify the performance of various methods in terms of three metrics: compression ratio, compression time and decompression time. Note that the compression ratio is defined as the ratio of the compressed data size to the original one.

2. Settings: As Chimp did, we regard 1,000 records of each dataset as a block. Each compression method is executed on up to 100 blocks per dataset, and the average metrics of one block are finally reported. By default, we regard each value as a double value. All experiments are conducted on a personal computer equipped with Windows 11, 11th Gen Intel(R) Core(TM) i5-11400 @ 2.60 GHz CPU and 16 GB memory. The JDK (Java Development Kit) version is 1.8.

Performance: the performance of Elf and Elf+ are listed in the table 2 below.

Compression ratio: as illustrated in Table 2 below, among all the floating-point compression methods, the erasing based lossless compression method (i.e., Elf) described in embodiments of the present disclosure has the best compression ratio on almost all datasets. In particular, for the time series datasets, compared with Gorilla and EPC, the Elf has an average relative improvement of (0.76-0.37)/0.76≈51%. Thanks to the erasing technique and elaborate XORcmp, Elf can still achieve relative improvement of 47% and 12% over Chimp and Chimp128 respectively on the time series datasets. For the non-time series datasets, Elf is also relatively (0.63-0.55)/0.63≈12.7% better than the best competitor Chimp128. We notice that there are few datasets that Chimp128 is slightly better than Elf in terms of compression ratio. For the datasets of WS, SUSA and BT, we find that there are many duplicate values within 128 consecutive records. In this case, Chimp128 can use only 9 bits to represent the same value. For the datasets of AS, PLat and PLon, since they have large decimal significand counts, Elf does not perform erasing but still consumes some flag bits. As pointed out by Gorilla, real-world floating point measurements often have a decimal place count of one or two, which usually results in small or medium β. To this end, Elf can achieve good performance in most real-world scenarios.

As illustrated in Table 2 below, for both time series and non-time series with small and medium β, Elf+ even outperforms the best competitor Chimp128 for datasets WS and SUSA, in which Chimp128 has a slightly better compression ration than Elf. This is because Elf+ takes full advantage of the fact that most values in a time series have the

TABLE 2

Times Series

Small B
Medium B

Dataset
CT
IR
WS
PM10
SUK
SUSA
SDE
DT

Compression
Floating
Gorilla
0.85
0.64
0.83
0.48
0.58
0. text missing or illegible when filed

8
0.72
0.83

Ratio

Chimp
0.64
0.59
0.81
0.46
0. text missing or illegible when filed

2
0.64
0.07
0.77

Chimp₁₂₈
0.32
0.24
0.28
0.21
0.29
0.23
0.27
0.35

FPC
0.75
0.6 text missing or illegible when filed

0.85
0.50
0.74
0.70
0.73
0.82

Elf
0.25
0.21
0.25
0.1 text missing or illegible when filed

0.22
0.24
0.25
0.31

Elf+
0.22
0.1 text missing or illegible when filed

0.20
0.11
0.19
0.18
0.23
0.2 text missing or illegible when filed

General
Xz
0.18
0.16
0.15
0.11
0.16
0.17
0.19
0.27

Brot text missing or illegible when filed

0.20
0.18
0.17
0.12
0.19
0.20
0.22
0. text missing or illegible when filed

2

LZ4
0.36
0.36
0. text missing or illegible when filed

7
0.27
0.39
0.39
0.41
0.52

Z text missing or illegible when filed

0.22
0.24
0.19
0.14
0.22
0.24
0.26
0.38

Snappy
0.29
0.30
0.27
0.21
0.32
0.32
0.35
0.51

Compression
Floating
Gorilla
18
21
17
1 text missing or illegible when filed

17
17
17
18

Time

Chimp
2 text missing or illegible when filed

21
22
18
2 text missing or illegible when filed

22
2 text missing or illegible when filed

24

( text missing or illegible when filed

)

Chimp₁₂₈
23
23
22
20
24
22
25
20

FPC
3 text missing or illegible when filed

40
40
40
28
28
28
31

Elf

text missing or illegible when filed

1
53
59
59
54

text missing or illegible when filed

6
58

text missing or illegible when filed

7

Elf+
34
35
58

text missing or illegible when filed

9
40

text missing or illegible when filed

9
4 text missing or illegible when filed

39

General
Xz
948
1106
810
1056
877
836
900
1045

Brot text missing or illegible when filed

1039
1655
1567
1449
1384
1611
1693
17 text missing or illegible when filed

2

LZ4
1082
1106
963
984
966
976
952
1091

Z text missing or illegible when filed

209
212
112
208
177
112
117
218

Snappy
195
236
52
214
169
56
172
195

Decompression
Floating
Gorilla
16
18
17
2 text missing or illegible when filed

16
17
17
17

Time

Chimp
24
22
24
19
22
24
24
5 text missing or illegible when filed

(

)

Chimp₁₂₈
17
16
16
15
18
16
18
18

FPC
28
28
26
29

text missing or illegible when filed

24
25
25

Elf
38
44
46
4 text missing or illegible when filed

7
45
44
45

Elf+
27
25
33
27
2 text missing or illegible when filed

29
31
30

General
Xz
161
147
114
125
156
133
14 text missing or illegible when filed

226

Brot text missing or illegible when filed

61
58
3 text missing or illegible when filed

53
41
43
69
70

LZ4
40
35
18
37
19
19
18
42

Z text missing or illegible when filed

46
48

text missing or illegible when filed

42
31

text missing or illegible when filed

Snappy
38
54
2 text missing or illegible when filed

38
19
21
2 text missing or illegible when filed

39

Times Series
Non Time Series

Medium B
Large B

Small B

Dataset
XP
BW
BT
BP
BM
AS
Avg.
FP
VC

Compression
Floating
Gorilla
0.73
0.99
0.94
0.84
0.70
0.82
0.76
0.58
1.00

Ratio

Chimp
0.65
0.88
0.85
0.77
0.72
0.77
0.70
0.47
0.86

Chimp₁₂₈
0.54
0.71
0.47
0.72
0.50
0.77
0.42
0.34
0.36

FPC
0.67
0.92
0.90
0.81
0.7 text missing or illegible when filed

0.82
0.75
0.02
0.91

Elf
0.31
0.59
0.58
0.56
0. text missing or illegible when filed

2
0.85
0.37
0.23
0.34

Elf+
0.25
0.56
0.52
0.50
0.38
0.86
0. text missing or illegible when filed

3
0.22
0.29

General
Xz
0.47
0.57
0.30
0.63
0.43
0.79
0.33
0.23
0.23

Brot text missing or illegible when filed

0.31
0.61
0.39
0.71
0.47
0.85
0.37
0.26
0.28

LZ4
0.69
0.6 text missing or illegible when filed

0.54
0.87
0. text missing or illegible when filed

1
1.01
0.53
0.41
0.47

Z text missing or illegible when filed

0.58
0.61
0.41
0.75
0. text missing or illegible when filed

1
0.91
0.40
0.30
0.34

Snappy
0.73
0.75
0.54
0.99
0.51
1.00
0.51
0.39
0. text missing or illegible when filed

2

Compression
Floating
Gorilla
20
21
20
19
18
20
18
16
19

Time

Chimp
20
26
25
24
25
27
23
21
24

( text missing or illegible when filed

)

Chimp₁₂₈
38
47
3 text missing or illegible when filed

48
38
50
32
27
27

FPC
40
42
47
27
30
38
35
39
43

43

Elf
51
73
60
63
65
87
66

text missing or illegible when filed

2
55

Elf+

text missing or illegible when filed

9
72
54
42
51
82
48
41
42

General
Xz
1939
1 text missing or illegible when filed

27
1100
1531
1444
2146
123 text missing or illegible when filed

1636

Brot text missing or illegible when filed

2074
1792
1715
1729
1827
1798
1704
1741
1674

LZ4
12 text missing or illegible when filed

1013
1010
1001
1000
1026
1032
985
974

Z text missing or illegible when filed

317
259
291
271
250
277
217
211
227

Snappy
179
189
200
1 text missing or illegible when filed

9
251
158
175

text missing or illegible when filed

88
250

Decompression
Floating
Gorilla
18
23
18
16
17
20
18
16
18

Time

Chimp
19
30
2 text missing or illegible when filed

27
25
2 text missing or illegible when filed

26
21
26

( text missing or illegible when filed

)

Chimp₁₂₈
22
28
21
26
22
25
20
18
19

FPC
32
27
31
24
26
34
28
28
29

Elf
41
58
53
48
48
29
44
33
44

Elf+
44
41
45
34

text missing or illegible when filed

6
35
33
30
33

General
Xz
435
427
2 text missing or illegible when filed

4
479
345
629
27 text missing or illegible when filed

196
194

Brot text missing or illegible when filed

109
97
79
93

text missing or illegible when filed

7
100
71
103
70

LZ4
36
42
38
40
38
44
35
36
37

Z text missing or illegible when filed

99
66
113
72
62
68
57
46
47

Snappy
49
40
42
41
46
48
37
40
39

Non Time Series

Medium B
Large B

Dataset
BTR
SB
CLat
CLon
PLat
PLon
Avg.

Compression
Floating
Gorilla
0.74
0.63
1. text missing or illegible when filed

3
1.03
1.03
1.03
0.88

Ratio

Chimp
0. text missing or illegible when filed

7
0. text missing or illegible when filed

0.92
0.98
0.90
0.99
0.79

Chimp₁₂₈
0.55
0.27
0.78
0.8 text missing or illegible when filed

0.90
0.99
0.63

FPC
0.09
0.59
0.96
1.00
0.96
1.00
0.84

Elf
0.36
0.27
0.56
0.63
0.96
1.06
0.55

Elf+
0.30
0.23
0.01
0.60
0.98
1.07
0.52

General
Xz
0.40
0.13
0. text missing or illegible when filed

0
0.63
0.93
0.96
0.51

Brot text missing or illegible when filed

0.43
0.14
0.65
0.0 text missing or illegible when filed

0.94
0.90
0.54

LZ4
0.5 text missing or illegible when filed

0.3

0.79
0.82
1.00
1.90
0.67

Z text missing or illegible when filed

0.4

0.17
0.68
0.71
0. text missing or illegible when filed

4
0.96
0.57

Snappy

text missing or illegible when filed

.54
0.25
0.83
0.87
1.0 text missing or illegible when filed

1.00
0.66

Compression
Floating
Gorilla
18
16
19
19
19
19
18

Time

Chimp
22
20
26
26
23

text missing or illegible when filed

6
23

( text missing or illegible when filed

)

Chimp₁₂₈
39
23
4 text missing or illegible when filed

48
45
46
38

FPC
43
41
42
38
40
48
43

43

Elf
62
48
64
70
71
72
62

Elf+
43
35
51
6 text missing or illegible when filed

48
66
49

General
Xz
1035
1040
1232
1 text missing or illegible when filed

16
1476
1351
1276

Brot text missing or illegible when filed

1755
1322
1692
1712
1628
1633
1669

LZ4
1089
976
9 text missing or illegible when filed

8
986
9 text missing or illegible when filed

957
957

Z text missing or illegible when filed

231
202
236
245
206
113
211

Snappy
19 text missing or illegible when filed

200
207
2 text missing or illegible when filed

178
149
200

Decompression
Floating
Gorilla
17
16
17
17
17
17
17

Time

Chimp
24
21
26
26

text missing or illegible when filed

4
26
24

( text missing or illegible when filed

)

Chimp₁₂₈
22
17
26
26
23
24
22

FPC
29
29
30
36
2 text missing or illegible when filed

35
31

Elf
49

text missing or illegible when filed

9
52
57
31
33
42

Elf+

text missing or illegible when filed

3
3 text missing or illegible when filed

41
49
33
36
36

General
Xz
312
126
434
461
664
663
381

Brot text missing or illegible when filed

56
58
243

text missing or illegible when filed

86
77
101

LZ4
39
37
38
37
35
10
35

Z text missing or illegible when filed

60
44
47
38
43
32
46

Snappy
39
36
42
37
32
43
38

text missing or illegible when filed

indicates data missing or illegible when filed

Compression time and decompression time: Elf takes a little more time than other floating-point compression algorithms during both compression and decompression processes. Compared with other floating-point compression algorithms, Elf adds an erasing step and a restoring step, which inevitably takes more time. However, the difference is not obvious, since they are all on the same order of magnitude. For almost all datasets, Elf+ takes even less time than Elf during both compression and decompression processes.

In summary, Elf can usually achieve remarkable compression ratio improvement for both time series data sets and non-time series datasets, with affordable cost of more time. Elf+ even performs better than Elf in terms both of compression ratio and running time.

Following single floating-point data occupying 32 bits are took as an example to explain the embodiments of the present disclosure in detail.

A single-precision floating-point value (abbr. single value) has a similar underlying storage layout to that of a double value, but it takes up only 32 bits, where 1 bit is for the sign, 8 values, we should make the following modifications.

The Normal Number for Single Value Satisfies:

$v = {(- 1)}^{s} \times 2^{e - 127} \times (1 + \sum_{i = 1}^{23} m_{i} \times 2^{- i})$

$e = {(e_{1} e_{2} \dots e_{8})}_{2} = \sum_{i = 1}^{8} e^{i} \times 2^{8 - i}$

If let m₀=1 and BF(v)=±(b_h−1b_h−2. . . b₀·b₋₁b₋₂. . . b_l)₂, we have b_−i=m_i+e−127, i>0.

In an alternative implementation, for a single floating point value, the reference mantissa bit may be determined by:

$g (α) = ⌈ α \times \log_{2} 1 0 ⌉ + e - 127$

$e = {(e_{1} e_{2} \dots e_{8})}_{2} = \sum_{i = 1}^{8} e^{i} \times 2^{8 - i}$

In an alternative implementation, for a single floating point value, when δ=0, it indicates that v itself has long trailing zeros. Thus, the erasing operation may be performed in response to δ≠0, i.e., the erasing operation may be performed in response to determining that a digit on a mantissa bit following the reference mantissa bit is not zero.

In an alternative implementation, for a single floating point value, the erasing operation may be performed in response to δ≠0 and β*<8. When the erasing operation is performed in response to δ≠0 and β*<8, a positive gain on compression ratio may be ensured while ensuring the lossless compression on floating-point value.

In an alternative implementation, for a single floating point value, the erasing operation may be performed in response to δ≠0 and 23−g(α)>3. When the erasing operation is performed in response to δ≠0 and 23−g(α)>3, a positive gain on compression ratio may be ensured while ensuring the lossless compression on floating-point value.

In an alternative implementation, for a single floating point value, the erasing operation is performed in response to β*<8 and δ≠0 and 23−g(α)>3. The other processing operations such as compression, decompression, encoding strategies for β* and the XORed result are similar to those described for the double value, and would not be repeated herein.

According to another embodiment of the present disclosure, an erasing-based lossless compression apparatus for floating-point values is provided. FIG. 11 illustrates an erasing-based lossless compression apparatus 300. The apparatus may be provided in a computer.

As illustrated in FIG. 11, the apparatus 300 includes: a calculation unit 301, configured to acquire a floating-point value, and calculate a decimal place count of the floating-point value; a transformation unit 302, configured to transform the floating-point value into a binary format, where the floating-point value in the binary format is composed of a digit on a sign bit, digits on exponent bits, and digits on mantissa bits; a determination unit 303, configured to determine, in the mantissa bits, a reference mantissa bit based on the decimal place count and the digits on the exponent bits; a erasing unit 304, configured to perform erasing operation on bits following the reference mantissa bit by setting corresponding digits on the bits following the reference mantissa bit to be zero, to obtain a value in the binary format, and use the value in the binary format obtained by the erasing operation as a mantissa prefix number of the floating-point value; a first XOR unit 305, configured to input the mantissa prefix number of the floating-point value into an XOR based compressor, to obtain an XORed result; and a storing unit, configured to store the XORed result.

It should be noted that the apparatus shown in FIG. 11 is corresponding to the method illustrated in FIG. 3. The embodiments or implementations of the method illustrated in FIG. 3 are also applicable to the apparatus 300, and will be not repeated herein.

According to another embodiment of the present disclose, an erasing-based lossless decompression apparatus for floating-point values is provided. FIG. 12 illustrates an erasing-based lossless decompression apparatus 400. The apparatus may be provided in a computer.

As illustrated in FIG. 12, the apparatus 400 includes:

- an acquisition unit 401, configured to acquire an XORed result and a modified decimal significand count of a floating-point value;
- a second XOR unit 402, configured to perform XOR operation on the XORed result and the mantissa prefix number of the previous floating-point value, to obtain the mantissa prefix number of the floating-point value;
- a calculation unit 403, configured to calculate a decimal place count of the floating-point value based on the modified decimal significand count of a floating-point value; and
- a recovering unit 404, configured to recover the floating-point value based on the mantissa prefix number of the floating-point value and the decimal place count of the floating-point value.

It should be noted that the apparatus shown in FIG. 12 is corresponding to the method illustrated in FIG. 5. The embodiments or implementations of the method illustrated in FIG. 5 are also applicable to the apparatus 400, and will be not repeated herein.

According to yet another embodiment of the present disclosure, a non-transitory computer readable medium is provided. The non-transitory computer readable medium stores a computer program thereon, the program, when executed by a processor, causing the processor to implement any one of the methods described above.

According to yet another embodiment of the present disclosure, an electronic device is provided. The electronic device includes one or more processors; a storage apparatus, storing one or more programs thereon, the one or more programs, when executed by the one or more processors, causing the one or more processors to implement any one of the methods described above.

It should be noted that in one or more of the above embodiments, the functions described in embodiments of the disclosure can be implemented by hardware, software, firmware, or any combination of them. When implemented by software, these functions can be stored in computer readable medium or transmitted as one or more instructions or codes on computer readable medium.

It should be understood that the various forms of processes shown above may be used to reorder, add, or delete steps. For example, the steps described in the present disclosure may be performed in parallel, sequentially, or in different orders. As long as the desired results of the technical solution disclosed in the present disclosure can be achieved, no limitation is made herein.

The above specific embodiments do not constitute limitation on the protection scope of the present disclosure. Those skilled in the art should understand that various modifications, combinations, sub-combinations and substitutions may be made according to design requirements and other factors. Any modification, equivalent replacement and improvement made within the spirit and principle of the present disclosure shall be included in the protection scope of the present disclosure.

Number	Date	Country	Kind
202310068186.3	Jan 2023	CN	national
202310070527.0	Jan 2023	CN	national

Erasing-Based Lossless Compression and Decompression Methods for Floating-Point Data

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (2)