This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2016-182090, filed on Sep. 16, 2016; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to a data processing apparatus and a data processing method.
As a lossless compression method for digital data, there is known a dictionary coder which compares compression target data and data held in a dictionary against each other, and which, in a case of data match, reduces the amount of data by using the position of matching data in the dictionary, the match length, and the like.
However, with the conventional technique, it is difficult to increase the throughput without reducing data compression efficiency.
According to an embodiment, a data processing apparatus includes a divider, a hash calculator, at least one hash memory, an access controller, and a compressor. The divider is configured to divide input data into a plurality of blocks. The hash calculator is configured to calculate hash values from the respective blocks. The at least one hash memory is configured to store pieces of first data that are based on the respective blocks. The access controller is configured to access the at least one hash memory by using the hash values, read one or some of the pieces of first data, each stored at an address indicated by each hash value, from the at least one hash memory, and write, at the addresses indicated by the hash values, pieces of first data that are determined based on the respective blocks. The compressor is configured to compress the input data into compressed data based on the input data and the read one or some of the pieces of first data.
Hereinafter, embodiments of a data processing apparatus and a data processing method will be described in detail with reference to the appended drawings.
First, a configuration of a data processing apparatus according to a first embodiment will be described.
Configuration of Data Processing Apparatus
In the following, the hash memory 11a and the hash memory 11b will be simply referred to as the hash memory(ies) 11 when there is no need to distinguish between the two.
The divider 1 divides input data into a plurality of blocks. Any method may be used to divide the input data into a plurality of blocks.
Example Division Method
Referring back to
When a block is received from the divider 1, the hash calculator 2 calculates a hash value of the block. Any method may be used to calculate the hash value. For example, the hash calculator 2 may take one byte at the beginning of the block as the hash value. Also, the hash calculator 2 may take the number of ones or zeros in the block, which is represented by a bit sequence, as the hash value, for example. Moreover, the hash calculator 2 may calculate the hash value by using other different hash functions, for example.
The hash calculator 2 inputs the hash value of each block to the access controller 3.
When the hash value of each block is received from the hash calculator 2, the access controller 3 accesses the hash memory 11a, the hash memory 11b, and the dictionary memory 12. Before describing operation of the access controller 3, an example of a memory structure according to the first embodiment will be described.
Example of Memory Structure
The index for the hash memory 11 is a hash value. Moreover, stored data in the hash memory 11 is first data (intermediate data), which is based on a block. The first data, which is based on a block, is arbitrary data that is specified by the block. For example, the first data, which is based on a block, is an address in the dictionary memory 12 where the block is stored.
In the description of the first embodiment, a case where the first data, which is based on a block, is the address of the block that is stored in the dictionary memory 12 will be described.
The dictionary memory 12 stores second data. The second data is two continuous blocks, for example. The second data is used as dictionary data in a compression process by the compressor 4.
First, the access controller 3 receives, from the hash calculator 2, a hash value K(a) of a block a, a hash value K(b) of a block b, a hash value K(c) of a block c, and a hash value K(d) of a block d. That is, in the example in
Next, the access controller 3 accesses the hash memory 11a with the hash values K(a), K(b), K(c), and K(d) as indices. Then, the access controller 3 reads one or some of the pieces of first data stored at the addresses, in the hash memory 11a, indicated by the hash values, and then, writes, at the corresponding address, first data which is based on the block for which the corresponding hash value has been calculated.
Specifically, in the example in
Also, in the example in
Also, in the example in
Moreover, in the example in
On the other hand, in the example in
The access controller 3 writes α(a) at the address, in the hash memory 11b, indicated by the hash value K(a). That is, α(w) which is stored at the address indicated by K(a) is updated to α(a) without being read out.
Furthermore, the access controller 3 writes α(b) at the address, in the hash memory 11b, indicated by the hash value K(b). That is, α(x) which is stored at the address indicated by K(b) is updated to α(b) without being read out.
Also, the access controller 3 reads α(y) stored at the address, in the hash memory 11b, indicated by the hash value K(c), and then, writes α(c) at the address. That is, α(y) which is stored at the address indicated by K(c) is updated to α(c) after α(y) is read out.
Also, the access controller 3 reads α(z) stored at the address, in the hash memory 11b, indicated by the hash value K(d), and then, writes α(d) at the address. That is, α(z) which is stored at the address indicated by K(d) is updated to α(d) after α(z) is read out.
That is, the number of times of reading of the hash memory 11a is two, and the number of the number of times of update (writing) of the hash memory 11a is four.
Also, that is, the number of times of reading of the hash memory 11b is two, and the number of the number of times of update (writing) of the hash memory 11b is four. The access controller 3 accesses the dictionary memory 12 by α(w) and α(x) read out from the hash memory 11a and α(y) and α(z) read out from the hash memory 11b. Then, the access controller 3 reads second data from the dictionary memory 12.
Furthermore, the access controller 3 writes in the dictionary memory 12, as second data, input data which is being processed (a plurality of pieces of block data obtained by the divider 1). Additionally, the address in the dictionary memory 12 where the input data which is being processed is to be stored has to be in correspondence with the address used for storing the data as the first data at the time of update of the hash memory 11. For example, the dictionary memory 12 may be updated by a method of shifting the address position k by k. For example, k is one.
In the case of k=1, the block a which is to be stored as the second data is written at an access position, in the dictionary memory, indicated by the address α(a), for example. At this time, the address is α(a)=α(prev)+1. Additionally, α(prev) is the access position of last writing in the dictionary memory 12. That is, in this case, it is the access position for input data processing of which has been completed immediately before.
Also, in the case of sequentially writing the block b, the block c, and the block d after the block a, the addresses will be α(b)=α(a)+1, α(c)=α(b)+1, and α(d)=α(c)+1.
As described above, the number of times of reading of the hash memory 11a is two, and the number of times of writing in the hash memory 11a is four, and thus, the number of times of access to the hash memory 11a is six in total. That is, the number of times the access controller 3 reads the first data from the hash memory 11a and the number of times the access controller 3 writes the first data in the hash memory 11a are different. The number of times of writing in the hash memory 11a by the access controller 3 is four, and thus, the update frequency is maintained and the search performance in the dictionary memory 12 is not reduced.
Likewise, the number of times of reading of the hash memory 11b is two, and the number of times of writing in the hash memory 11b is four, and thus, the number of times of access to the hash memory 11b is six in total. That is, the number of times the access controller 3 reads the first data from the hash memory 11b and the number of times the access controller 3 writes the first data in the hash memory 11b are different. The number of times of writing in the hash memory 11b by the access controller 3 is four, and thus, the update frequency is maintained and the search performance in the dictionary memory 12 is not reduced.
Furthermore, by causing the hash memories 11a and 11b to operate in parallel, the throughput may be increased compared to a conventional access method of performing reading four times and writing four times with respect to one hash memory, for example.
Next, an example of the dictionary memory 12 according to the first embodiment will be described.
In the example in
Accordingly, compared to the conventional method of storing one block at one address, longer data may be acquired by one access. Therefore, the access controller 3 may read, from the dictionary memory 12, second data of a longer data length than the data length of a block obtained by the divider 1 in less accesses compared to the conventional method. The dictionary memory 12 illustrated in
Additionally, the address indicating the access position for second data stored in the dictionary memory 12 may be separated into an address indicating the top portion of the second data and an address indicating the position of data included in the second data.
Referring back to
When second data (for example, a plurality of continuous blocks) is received from the access controller 3, the compressor 4 compresses the input data into compressed data based on the second data and the input data. For example, the compressor 4 compresses the input data into compressed data by comparing the input data and the second data against each other and reducing the amount of data of matching parts.
A storage device 200 stores the compressed data compressed by the compressor 4. Additionally, a system may be configured by the data processing apparatus 100 and the storage device 200.
As described above, with the data processing apparatus 100 according to the first embodiment, the number of times the access controller 3 reads first data stored in the hash memory 11a and the number of times the access controller 3 updates the first data stored in the hash memory 11a are different. Likewise, the number of times the access controller 3 reads first data stored in the hash memory 11b and the number of times the access controller 3 updates the first data stored in the hash memory 11b are different. The hash memory 11a and the hash memory 11b operate in parallel. Moreover, the access controller 3 reads, from the dictionary memory 12, second data of a longer data length than the data length of a block in one access. Also, the access controller 3 writes, in the dictionary memory 12, second data of a longer data length than the data length of a block in one access.
Therefore, with the data processing apparatus 100 according to the first embodiment, by suppressing reduction in the search performance in the dictionary memory 12 due to parallel processing of the hash memories 11, reduction in the compression efficiency may be suppressed, and also, high throughput may be expected due to parallel processing of the hash memories 11. Also, because second data of a long data length may be acquired from the dictionary memory 12 while suppressing an increase in the number of accesses to the dictionary memory 12, the compression efficiency may be increased.
Next, a second embodiment will be described. In the description of the second embodiment, similarities to the first embodiment are omitted, and differences from the first embodiment will be described.
Configuration of Data Processing Apparatus
Description of the divider 1, the hash calculator 2, and the compressor 4 according to the second embodiment is the same as the description in the first embodiment, and is omitted. In the description in the second embodiment, the access controller 3 and the hash memory 11 will be described.
First, an example of a memory structure according to the second embodiment will be described.
Example of Memory Structure
The index for the hash memory 11 is a hash value. Moreover, stored data in the hash memory 11 is the second data described above. The second data according to the second embodiment is the same as that of the first embodiment, and description thereof is omitted. The second data which is stored in the dictionary memory 12 in the first embodiment is stored in the hash memory 11 in the second embodiment.
Additionally, the address indicating the access position for second data stored in the hash memory 11 may be separated into an address indicating the top portion of the second data and an address indicating the position of data included in the second data.
The access controller 3 performs reading and update of second data stored in the hash memory 11. When the hash value of each block is received from the hash calculator 2, the access controller 3 accesses the hash memory 11 with the hash value as the index. Then, the access controller 3 reads one or some of the pieces of second data without reading all the second data accessed.
Specifically, in the case where the hash memory 11 is accessed by hash values K(a), K(b), K(c), and K(d), the access controller 3 reads pieces of second data which are stored at the hash values K(a) and K(b), for example.
Next, the access controller 3 updates the hash memory 11 by writing input data (a plurality of pieces of block data), corresponding to the hash values, which is being processed. Specifically, in the case where the hash memory 11 is accessed by the hash values K(a), K(b), K(c), and K(d), the access controller 3 writes, as the second data, a block a and a block b at an address indicated by K(a), writes, as the second data, the block b and a block c at an address indicated by K(b), writes, as the second data, the block c and a block d at an address indicated by K(c), and writes, as the second data, the block d and a block e at an address indicated by K(d).
Lastly, the access controller 3 inputs the one or some of the pieces of second data read from the hash memory 11 to the compressor 4.
As described above, according to the data processing apparatus 100 of the second embodiment, the same effect as that of the data processing apparatus 100 according to the first embodiment is achieved.
Next, a third embodiment will be described. In the description of the third embodiment, similarities to the first embodiment are omitted, and differences from the first embodiment will be described.
Configuration of Data Processing Apparatus
Description of the divider 1, the hash calculator 2, the access controller 3, the compressor 4, the hash memory 11a, the hash memory 11b, and the dictionary memory 12a according to the third embodiment is the same as the description in the first embodiment, and is omitted. In the description in the third embodiment, the analyzer 5, the decompressor 6, and the dictionary memory 12b will be described.
The analyzer 5 acquires analysis information indicating an analysis result by analyzing compressed data. The analysis information includes match information of compressed data and second data (dictionary data), an address in the dictionary memory 12b, and the like, for example. The match information includes information indicating whether data included in compressed data and dictionary data stored in the dictionary memory 12b match each other or not, and information indicating the matching (or non-matching) data length, for example. Also, an address in the dictionary memory 12b indicates an access position for the second data matching the data included in the compressed data. In the case where input data is compressed by variable length coding or coding that uses some kind of prediction method, such as coding that uses a difference value to immediately preceding data, the analyzer 5 also acquires, as the analysis information, information that is necessary to decompress (decode) the compressed data. The analyzer 5 inputs the analysis information to the decompressor 6.
When the analysis information is received from the analyzer 5, the decompressor 6 generates decompressed data from the compressed data based on the analysis information. Additionally, the decompressed data is the same as the input data which has been input to the divider 1.
Here, the second data which is stored at one address in the dictionary memory 12b is data of a longer data length than the block described above. For example, the second data has a data length two times the data length of the block. Accordingly, the number of times of accesses to the dictionary memory 12b for decompressing of the compressed data may be reduced compared to a case where one block is stored at one address, and thus, the throughput is increased. Additionally, the second data stored in the dictionary memory 12b may be a block and a following block, or may be a block and some kind of data which is estimated from the data. However, the data has to be the same as the second data which has been used in the compression process.
As described above, with the data processing apparatus 100 according to the third embodiment, the decompressor 6 acquires in one access, from the dictionary memory 12b, the second data of a data length longer than the data length of block data. Therefore, with the data processing apparatus 100 according to the third embodiment, the throughput of the decompressing process for decompressing compressed data generated by the compressor 4 may be increased.
Additionally, some kind of data according to input data may be held in advance in the hash memory 11 and the dictionary memory 12 according to the first to the third embodiments described above.
For example, with the data processing apparatus 100 according to the first embodiment, second data whose appearance frequency is statistically high may be held in advance in the dictionary memory 12, and the address in the dictionary memory 12 may be held in advance in the hash memory 11. For example, in the case where the second data includes two blocks, an address in the dictionary memory 12 is stored at an address in the hash memory 11 indicated by the hash value of a block at the beginning, the address in the dictionary memory 12 indicating an access position for second data including the corresponding block at the beginning. In this case, the hash memory 11 and the dictionary memory 12 may be, but not necessarily, updated.
For example, in the case where the hash memory 11 and the dictionary memory 12 are updated, match between data included in input data and the second data (dictionary data) may be expected even in a situation where not much time has passed from the start of the compression process when the hash memory 11 and the dictionary memory 12 are not yet sufficiently updated, thereby allowing compression of the input data.
Also, in the case where the hash memory 11 and the dictionary memory 12 are not updated, the number of times of accesses to the hash memory 11 and the dictionary memory 12 may be reduced, and thus, the throughput of the compression process may be increased.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2016-182090 | Sep 2016 | JP | national |