This application is based upon and claims the benefit of priority of the prior Japanese Patent application No. 2022-031938, filed on Mar. 2, 2022, the entire contents of which are incorporated herein by reference.
The embodiment discussed herein relates to an arithmetic processing apparatus and a method for memory access.
In scientific computation such as physical simulation, simultaneous linear equations for large-scale sparse matrices are solved. Scientific computation uses the cyber-physical system (CPS) and the critical data infrastructure for big data processing that supports the Society 5.0, which is a Japanese scientific and technological agenda, and the large-scale graphical data architecture.
The graph analysis process used in the scientific computation is basically a sparse matrix arithmetic operation in many cases. Although an architecture capable of efficiently handling these sparse matrix arithmetic operations has been studied, a memory bandwidth may come to be a bottleneck in a sparse matrix arithmetic operation as compared with the arithmetic processing in a core.
As an aspect, an arithmetic processing apparatus includes a controlling unit that refers to an attribute of a compressing scheme on data on a main memory when the data is transferred between the main memory and a memory controller, and that transfers the data by switching the compressing scheme based on the referred attribute.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
A multi-core processor 60 includes a main memory 6, a memory controller 7, an LLC (Last Level Cache) 8, and multiple cores 81. In addition, the multi-core processor 60 retains a TLB (Translation Lookaside Buffer) 701.
The main memory 6 includes a transfer IF (Interface) 61 and transfers data to the memory controller 7 without compressing the data.
The memory controller 7 includes a transfer IF 71 and transfers data that the main memory 6 receives to the LLC 8 without compressing the data. The memory controller 7 uses information of the TLB 701 that the multi-core processor 60 retains.
The multiple cores 81 process data received from the memory controller 7 via the LLC 8. On the side of the multiple cores 81, address translation may be performed by obtaining attribute information with reference to the TLB 701, using a physical address.
In the multi-core processor 60a illustrated in
The transfer IF 61 of the main memory 6 transfers the compressed data 601 to the memory controller 7 in a lossless compressing scheme.
The transfer IF 71 of the memory controller 7 transfers the compressed data 601 received from the main memory 6 to the LLC 8 in a uncompressing scheme.
In the multi-core processor 60b illustrated in
The transfer IF 61 of the main memory 6 transfers the lossy compressed data 601 to the memory controller 7 in a lossy compressing scheme and transfers the uncompressed data 602 to the memory controller 7 in an uncompressing scheme.
The transfer IF 71 of the memory controller 7 transfers the lossy compressed data 601 to the LLC 8 in a lossy compressing scheme, and transfers the uncompressed data 602 received from the main memory 6 to the LLC 8 in an uncompressing scheme.
The multi-core processor 60b illustrated in
Hereinafter, an embodiment will now be described with reference to the accompanying drawings. However, the following embodiment is merely illustrative and is not intended to exclude the application of various modifications and techniques not explicitly described in the embodiment. Namely, the present embodiment can be variously modified and implemented without departing from the scope thereof. Further, each of the drawings can include additional functions not illustrated therein to the elements illustrated in the drawing.
Hereinafter, like reference numbers designate the same or similar elements, so repetitious description is omitted here.
The multi-core processor 10 includes a main memory 1 (in other words, a main storing device), a memory controller 2, an LLC 3, and multiple cores 31. The multi-core processor 10 further retains a TLB 201 and compression information 202.
The main memory 1 includes a transfer IF 11 serving as an example of a controlling unit, and transfers uncompressed data 101 to the memory controller 2 in a lossy compressing scheme, a lossless compressing scheme, or an uncompressing scheme.
The memory controller 2 includes a transfer IF 21 serving as an example of the controlling unit, and transfers the uncompressed data 101 that the main memory 1 receives to the LLC 3 without compressing the data. In addition, the memory controller 2 uses the TLB 201 and compression information 202 held by the multi-core processor 10. The memory controller 2 compresses and decompresses the data.
The multiple cores 31 process data received from the memory controller 2 via the LLC 3. On the side of the multiple cores 31, address translation may be performed by obtaining attribute information with reference to the TLB 201, using a physical address.
The multi-core processor 10 transfers data between the main memory 1 and the LLC 3 with or without compressing the data. The design of the LLC 3 and the cores 31 need not to be changed from that of the related examples illustrated in
In addition, lossy compression can largely heighten the compressing ratio. Being retained in the form of the uncompressed data 101 on the main memory 1, the data can be easily shared by another device such as an accelerator 4. In the related examples, fragmentation may occur when the written data is compressed. In contrast, in the embodiment, since fragmentation does not occur, memory control can be efficiently performed.
Furthermore, in the embodiment, since the compressing scheme is switched at the time of transfer, it is sufficient that only a region for uncompressed data needs to be reserved in the main memory 1, so that the switching can be facilitated.
In the TLB 201 and the compression information 202, a logical address, a physical address, and attributes are associated with one another. The attributes may include “Valid”, “Read Only”, “Dirty”, and a compressing scheme. In the attributes of “Valid”, “Read Only” and “Dirty”, the value “1” indicates valid, and the value “0” indicates invalid. In the attribute of “compressing scheme”, the value “00” may indicate lossy compression, and the value “01” may indicate lossless compression.
That is, the transfer IF 21 refers to the attribute of the compressing scheme for the data on the main memory 1 when data transfer between the main memory 1 and the memory controller 2. The transfer IF 21 switches a compressing scheme based on the referred attribute and performs data transfer in the switched compressing scheme.
The SpMV arithmetic operation illustrated in
In the pseudo code illustrated in
Here, the B/F ratio of an SpMV will now be examined. A B/F ratio is a memory band width required for each individual arithmetic operation.
The data read from the matrix is represented by 12B+α(8 B(data)+4 B (col_index)+α(row_ptr)), and the arithmetic operation is represented by 2FLOP (+, *). The B/F rate requested by the SpMV is (12 B+α)/2FLOP=6B/F. Assuming that the memory band width (upper limit of hardware) of A64FX is 0.4 Byte/Flop, the execution performance of the SpMV is 0.4/6=6.7%. That is, 93.3% of execution time is time for waiting for data supply from the memory.
By compressing the sparse matrix data as the above, the B/F ratio of the SpMV can be reduced and accordingly, the execution performance can be improved.
In the compressing process according to the embodiment, the conversion from double precision floating-point number to a single-precision floating-point number is performed, data is compressed, and the conversion from double precision to half precision is performed.
In the conversion to a single-precision floating-point number, DP (double-precision floating-point) data is lossly converted to SP (single-precision floating-point) data and the SP data is transferred.
In the compression of col_index, which is integer data, compression is performed on the upper bits common to the cache line and the lower bits of each word. Since neighboring col_indices have relatively close values, the upper bits are often common. Also, in the compression of col_index, each line is determined whether to be a compressed line simply by referring to the first bit of the line. If the line is incompressible, line transfer is performed with the normal bit number.
In the conversion from the double-precision or the single-precision to the half-precision, if the upper bits of the exponent are common, the compression is performed in a unit of a cache line.
In the example illustrated in
As illustrated in the reference sign B2, the compressed data is obtained by compressing the configuration of the six-bits exponent and the configuration of the one-bit sign, the five-bit exponent, and the ten-bit mantissa into eight-word data. The data represented by the reference sign B2 has a size of (1+6) bits+16 bits×8 words=135 bits≈17 Bytes.
Here, description will now be made in relation to a procedure of solving a simultaneous linear equation b=A x by an iterative method using lossy compression.
First, a matrix A and a vector b are given.
As the initialization process, the following (eq. 1) and (eq. 2) are calculated.
p
(0)
=r
(0)
=b−A (eq. 1)
norm=∥r(0)∥ (eq. 2)
k=0
The calculation of following (eq. 3) to (eq. 9) are repeated until norm<ε.
Next, description will now be made in relation to changing a compressing scheme in the middle of the repeating.
First, a matrix A and a vector b are given.
As the initialization process, the following (eq. 10) and (eq. 11) are calculated.
p
(0)
=r
(0)
=b−A (eq. 10)
norm=∥r(0)∥ (eq. 11)
Compressing scheme=lossy compression (SP) is specified, and then the calculation of following (eq. 12) to (eq. 18) are repeated until norm<ε.
When norm<α (threshold) is established, the compressing scheme is changed to compressing scheme=uncompressing.
k+=1 (eq. 18)
The calculation is finally executed with the DP, and since the arithmetic operation is repeated until the error comes to be within the specified error ε, the precision falls within the requested range when the arithmetic operation converges. By using lossy compression (SP), the performance of the SpMV can be improved, and switching on the main memory 1, which is in the DP, is facilitated.
As illustrated in
When norm<α (threshold) is established, the compressing scheme is changed to compressing scheme=uncompressing.
The change of a compressing scheme may be achieved by means of software through changing the attribute of the page table in the tables of the TLB and the compression information illustrated in the reference sign B3. When norm<α (threshold) is established, the compression flag of the attribute may be changed to the uncompressing by a system call.
In the tables of the TLB and the compression information represented by the reference sign B3, a logical address, a physical address, and attributes are associated with one another. The attributes may include a compression flag #0, a compression flag #1, a threshold, and a threshold register number.
Further, the switching of the compressing scheme may be achieved by changing a value of the dedicated threshold register 22 in the memory controller 2. The value (norm) of the threshold register 22 is updated from the software.
Additionally, as illustrated by the reference sign B3, multiple compression flags #0 and #1 and a threshold are added to the attributes of the page table. If the value of the threshold register 22 is less than the specified threshold, compression or decompression is performed on the basis of the value of the compression flag #0. If the value of the threshold register 22 is equal to or larger than the specified threshold, compression or decompression is performed on the basis of the value of the compression flag #1.
This means that the transfer IF 21 switches the compressing scheme on the basis of the result of comparison of the predetermined threshold with the value of the threshold register 22 provided in the memory controller 2.
In the tables of the TLB and the compression information, a logical address, a physical addresses, and attributes are associated with one another. The attributes may include “Valid”, “Read Only”, “Dirty”, “compression flag #0”, “compression flag #1”, a threshold, and a threshold register number. In the attributes of “Valid”, “Read Only” and “Dirty”, the value “1” indicates valid, and the value “0” indicates invalid. In the attributes of “compressing scheme”, the value “00” may indicate lossy compression, and the value “01” may indicate lossless compression. The threshold register number is a number for identifying the corresponding threshold register 22 among the multiple threshold registers 22 provided to the memory controller 2.
Here, description will now be made in relation to a method of solving a simultaneous linear equation b=A x by an iterative method.
First, a matrix A and a vector b are given, and an initialization process is performed. Then, the following processes (1) to (3) are repeated.
Next, description will now be made in relation to a method of changing the compressing scheme in the middle of the repeating.
First, a matrix A and a vector b are given, x0 is initialized, and k=0 is given. The compressing scheme=compressing scheme #3 is specified and an initialization process is performed. Then, the following processes (1) to (4) are repeated.
(B-1) First Modification:
In the first modification, the colidx compression is carried out by a lossless compression scheme. Compression is performed on the upper bits common to the cache lines and the lower bits of each word. Since neighboring colidxes have relatively close values, the upper bits are often common. Each line is determined whether to be a compressed line simply by referring to the first bit of the line. If the line is incompressible, line transfer is performed with the normal bit number.
In the example represented by a reference sign C1, the line size is 64B which is derived by colidx[ ]=[18, 19, 22, 325, 343, . . . ] in the normal data prior to being compressed. That is, the size of the normal data is 32 bits×16 words=512 bits.
In the example represented by the reference sign C2, the compressed data is formed of a one-bit flag bit (“1” indicating compressed data), a 20-bit base bit, and 12-bits word. That is, the size of the compressed data is 1 bit+20 bits+12 bits×16 words=13 bits and the compressing ratio is 2.4.
That is, the transfer IF 21 divides each of the cache lines of data into an upper part having a predetermined bit number and a lower part. If the upper parts of the cache lines are common, the transfer IF 21 performs data transfer of the common upper part and the lower parts of the respective data words.
In the example represented by the reference sign D1, the upper 20 bits of the each word of the data of 32 bits×16 words=64 Byte are referred to. If at least one of the 20 upper bits in the respective words do not match, the data is determined to be incompressible.
In the example illustrated in the reference sign D2, a one-bit flag bit (“0” indicating uncompressed data) is attached to the leading position of the original data indicated by the reference signal D1 to indicate incompressible data. That is, the data size of incompressible data is 1 bit+64 Bytes.
In the multi-core processor 10a according to the first modification illustrated in
When transfer, an access count per page and success/failure in the compression are counted. For example, the page size may be four KB and the cache line size may be 64 B.
An adjusting process of the common bit number may be carried out periodically (e.g., when the access count reaches a predetermined value). If the success ratio lowers, the bit number may be increased, and if the success ratio is high, the bit number may be decreased.
If the compressed data is read-only data, adjustment may be made, before the start of an adjusting process of a bit width, such that the compressed success ration comes to be equal to or higher than a predetermined value. In this case, the common bit number is set to be small, so that data reading is repeated, and the common bit number is increased when the success ratio is less than or equal to or less than the predetermined value.
That is, the transfer IF 21 changes the bit number of the upper part according to the success ratio of data compressing.
In the TLB 201 represented by the reference sign E1, a logical address, a physical address, and attributes are associated with one another. The attributes may include “Valid”, “Read Only”, “Dirty”, compressing scheme #0, compressing scheme #1, a threshold, a threshold register number, and a statistic information table number.
For the data of the logical address 0x1_107B represented by the reference sign E1, the statistic information table number is set to “0”, which means that the data corresponds to data of the table number “0” of the statistic information 23 represented by the reference sign E2. In addition, for the data of the logical address 0x3_0202 represented by the reference sign E1, the statistic information table number is set to “1”, which means that the data corresponds to data of the table number “1” of the statistic information 23 represented by the reference sign E2.
The example of
The graph represented by the reference sign F1 indicates the compressible ratio (see oblique line pattern area) and incompressible ratio (see the lattice pattern area) for each type of sparse matrix. The graph represented by the reference sign F2 indicates a compressing ratio for each type of sparse matrix. The reference sign F3 indicates that compressed data consists of a one-bit flag bit, a 20-bit base, and multiple 12-bit words.
The example illustrated in
The graph represented by the reference sign G1 indicates the compressible ratio (see oblique line pattern area) and incompressible ratio (see the lattice pattern area) for each type of sparse matrix. The graph represented by the reference sign G2 indicates a compressing ratio for each type of sparse matrix. The reference sign G3 indicates that compressed data consists of a one-bit flag bit, an 18-bit base, and multiple 14-bit words.
(B-2) Second Modification:
In the second modification, the exponent is expanded, and when the upper bit of the exponent is common, compression is performed in a unit of a cache line.
In the reference sign H2, data having a size of 64 bits×8 words=512 bits having a one-bit sign, an 11-bit exponent, and a 52-bit mantissa. In the reference sign H2, data having a size of (1+6) bits+16 bits x S words=135 bit≈17B, which is the sum of the data indicated by the reference sign H1 and the expanded 16-bits exponent.
That is, the transfer IF 21 divides the exponent of the floating-point data into an upper part having a predetermined bit number and a lower part. If the upper parts are common in a unit of a data block, the transfer IF 21 performs data transfer of the common upper part and the lower parts of the respective data words.
Also in the second modification, a compressing scheme may be switched on the basis of the threshold like the above-described embodiment. In addition, like the first modification described above, the second modification may perform an adjusting process of a common bit width.
If the upper bits are not common, only mantissas may be compressed. The original data represented by the reference sign H4 consists of a one-bit sign, an 11-bit exponent, and a 52 bit mantissa. The compressed data represented by the reference sign H5 is composed of a one-bit Sign, an 11-bit exponent, and a 10-bit mantissa.
If the upper bits are not common, the mantissas may also be kept to be the double-precision. The transfer IF 21 may truncate a given lower bits in the mantissa of the floating-point data, and transfer the remainder data.
In relation to the data composed of a six-bit expanded exponent in addition to a five-bit exponent and a 10-bit mantissa, the reference sign I1 indicates the compressible ratio (see oblique line pattern area) and an incompressible ratio (see the lattice pattern area) for each type of sparse matrix.
In relation to the data composed of a five-bit expanded exponent in addition to a six-bit exponent and a 10-bit mantissa, the reference sign 12 indicates the compressible ratio (see oblique line pattern area) and an incompressible ratio (see the lattice pattern area) for each type of sparse matrix.
In relation to the data composed of a seven-bit expanded exponent in addition to a four-bit exponent and a 10-bit mantissa, the reference sign 13 indicates the compressible ratio (see oblique line pattern area) and an incompressible ratio (see the lattice pattern area) for each type of sparse matrix.
In relation to the data composed of a four-bit expanded exponent in addition to a seven-bit exponent and a 10-bit mantissa, the reference sign 14 indicates the compressible ratio (see oblique line pattern area) and an incompressible ratio (see the lattice pattern area) for each type of sparse matrix.
In relation to the data (see reference sign J1) composed of a six-bits expanded exponent in addition to a five-bit exponent and a 10-bit mantissa,
In relation to the data (see reference sign K1) composed of a five-bits expanded exponent in addition to a six-bit exponent and a 10-bit mantissa,
In relation to the data (see reference sign L1) composed of a seven-bits expanded exponent in addition to a four-bit exponent and a 10-bit mantissa,
In relation to the data (see reference sign M1) composed of a four-bits expanded exponent in addition to a seven-bit exponent and a 10-bit mantissa,
(B-3) Combination of the Embodiment, the First Modification, and the Second Modification:
In the TLB 201 illustrated in
(B-4) Example of Hardware Configuration:
The multi-core processor 10 includes the main memory 1 (DRAM), the memory controller 2, the LLC 3, and a core 31 (a processor core unit). Note that DRAM is an abbreviation for Dynamic Random Access Memory.
The main memory 1 includes a transfer IF 11 and the memory controller 2 includes a transfer IF 21. Between the transfer IF 11 of the main memory 1 and the transfer IF 21 of the memory controller 2, compressed data is transmitted and received, and an address and control data for compression control is transmitted and received.
Uncompressed data is transmitted and received between the transfer IF 21 of the memory controller 2 and the LLC 3.
(B-5) Example of Operation:
Description will now be made in relation to data transferring process in the transfer IF 21 of the memory controller 2 according to the embodiment with reference to a flow diagram (Steps S1 to S10) of
The transfer IF 21 of the memory controller 2 selects a compressing scheme x (Step S1). The details of the selecting process of a compressing scheme will be described below with reference to
The transfer IF 21 determines whether the compressing scheme x is uncompressing (Step S2).
If the compressing scheme x is uncompressing (YES route of Step S2), the transfer IF 21 performs a normal reading/writing process (Step S3). Then, data transferring process in the transfer IF 21 of the memory controller 2 according to the embodiment ends.
If the compressing scheme x is not uncompressing (see NO route of Step S2), the transfer IF 21 determines whether the process is to read (Step S4).
If the process is not to read (see NO route of Step S4), the transfer IF 21 performs a data compressing process in the compressing scheme x (Step S5). The details of the data compressing process will be described below with reference to
The transfer IF 21 makes a writing request with compression and sends the compressed data (Step S6).
The transfer IF 21 updates the statistic information (Step S7). Then, data transferring process in the transfer IF 21 of the memory controller 2 according to the embodiment ends.
If the process is to read in Step S4 (see YES route of Step S4), the transfer IF 21 sends a reading request with compression (Step S8).
The transfer IF 21 receives data (Step S9).
The transfer IF 21 decompresses data in the compressing scheme x (Step S10). The details of the data decompressing process will be described below with reference to
Next, description will now be made in relation to data transferring process in the transfer IF 11 of the main memory 1 according to the embodiment with reference to a flow diagram (Steps S21 to S30) of
The transfer IF 11 of the main memory 1 determines whether a request for data process is received (Step S21).
If the request for data process is not received (see NO route of Step S21), the process of Step S21 is repeated.
If the request for data process is received (see YES route of Step S21), the transfer IF 11 determines whether data to be processed is data with compression (Step S22).
If the data to be processed is not data with compression (see NO route of Step S22), the transfer IF 11 performs a normal reading/writing process (Step S23). Then, the data transferring process in the transfer IF 11 of the main memory 1 according to the embodiment ends.
On the other hand, if the data to be processed is data with compression (see YES route of Step S22), the transfer IF 11 determines whether the process is to read (Step S24).
If the process is not to read (see NO route of Step S24), the transfer IF 11 receives data (Step S25).
The transfer IF 11 decompresses the data in the compressing scheme x (Step S26). The details of the data decompressing process will be described below with reference to
The transfer IF 11 writes data into the main memory 1 (Step S27), and then the data transferring process in the transfer IF 11 of the main memory 1 according to the embodiment ends.
If the process is to read in Step S24 (see YES route of Step S24), the transfer IF 11 reads the data from the main memory 1 (Step S28).
The transfer IF 11 performs a data compressing process in the compressing scheme x (Step S29). The details of the data compressing process will be described below with reference to
The transfer IF 11 transmits the data (Step S30), and then the data transferring process in the transfer IF 11 of the main memory 1 according to the embodiment ends.
Next, the selecting process of the compressing scheme according to the embodiment will now be described with reference to a flow diagram (Steps S11 to S15) illustrated in
The transfer IF 21 of the memory controller 2 reads attributes from the TLB 201 (Step S11). The attributes to be read may be, for example, t=threshold, i=threshold register number, and f0=compression flag #0, and f1=compression flag 1.
The transfer IF 21 sets the value “i” of the threshold register 22 in the variable x (Step S12).
The transfer IF 21 determines whether a relationship x<t is satisfied (Step S13).
If the relationship x<t is satisfied (see YES route of Step S13), the transfer IF 21 returns an f0 (Step S14). Then, the selecting process of the compressing scheme according to the embodiment ends.
On the other hand, if a relationship x≥t is satisfied (see NO route of Step S13), the transfer IF 21 returns f1 (Step S15). Then, the selecting process of the compressing scheme according to the embodiment ends.
Next, description will now be made in relation to a selecting process of the compressing scheme according to the first modification and the second modification with reference to the flow diagram (Steps S16 to S20) of
The transfer IF 21 of the memory controller 2 reads an attribute from the TLB 201 (Step S16). The attributes to be read may be, for example, f=compression flag and s=statistic register number.
The transfer IF 21 determines whether f is a valid value (Step S17).
If f is a valid value (see YES route of Step S17), the transfer IF 21 reads the common bit from the statistic information and sets n=bit number (Step S18). Then, the process proceeds to Step S20.
On the other hand, if f is not a valid value (see NO route in Step S17), the transfer IF 21 sets n=0 (Step S19).
The transfer IF 21 returns the f and n (Step S20). Then, the selecting process of the compressing scheme according to the first modification, and the second modification ends.
Next, description will now be made in relation to a selecting process of a compressing scheme according to the embodiment, the first modification, and the second modification with reference to a flow diagram (Steps S101 to S109) of
The transfer IF 21 of the memory controller 2 reads the attributes from the TLB 201 (Step S101). The attributes to be read may be, for example, t=threshold, i=threshold register number, and f0=compression flag #0, and f1=compression flag #1.
The transfer IF 21 sets the value “i” of the threshold register 22 in the variable x (Step S102).
The transfer IF 21 determines whether a relationship x<t is satisfied (Step S103).
If the relationship x<t is satisfied (see YES route of Step S103), the transfer IF 21 returns a f=f0 (Step S104). Then, the process proceeds to Step S106.
On the other hand, if a relationship x≥t is satisfied (see NO route of Step S103), the transfer IF 21 returns f=f1 (Step S105).
The transfer IF 21 determines whether f==10 or f==f1 (Step S106).
If f==10 or f==11 (see YES route in Step S106), the transfer IF 21 reads the common bits from the statistic information and sets n=bit number (Step S107). Then, the process proceeds to Step S109.
On the other hand, if neither f==10 nor f==11 is satisfied (see NO route in Step S106), the transfer IF 21 sets n=0 (Step S108).
The transfer IF 21 returns f and n (Step S109). Then, the selecting process of the compressing scheme according to the embodiment, the first modification, and the second modification ends.
Next, description will now be made in relation to an updating process of the statistic information when data transmitting and receiving according to the embodiment with reference to a flow diagram (Steps S71 to S73) of
The transfer IF 21 of the memory controller 2 determines whether the compression was successful (Step S71).
If the compression was not successful (see NO route of Step S71), the process proceeds to Step S73.
On the other hand, if the compression was successful (see YES route in Step S71), the transfer IF 21 increments the success count of the entry in the statistic information table by one (+1) (Step S72).
The transfer IF 21 increments the access count in the statistic information table by one (+1) (Step S73). Then, the updating process of the statistic information when data transmitting and receiving according to the embodiment ends.
Next, description will now be made in relation to an updating process of the statistic information when bit width adjustment in the embodiment with reference to a flow chart (Steps S76 to S79) of
The transmission IF 21 of the memory controller 2 determines whether or not a relationship (compression success count)/(access count)>predetermined value #0 is satisfied (Step S76).
If a relationship (compression success count)/(access count)≤predetermined value #0 is satisfied (see NO route in Step S76), the process proceeds to Step S78.
On the other hand, if a relationship (compression success count)/(access count)>predetermined value #0 is satisfied (see YES route of Step S76), the transfer IF 21 decrement the common bit number by one (−1) (Step S77).
The transmission IF 21 of the memory controller 2 determines whether or not a relationship (compression success count)/(access count)<predetermined value #1 is satisfied (Step S78).
If the relationship (compression success count)/(access count)≥predetermined value #1 (see NO route of Step S78), the updating process of the statistic information when bit width adjustment in the embodiment ends.
If the relationship (compression success count)/(access count)<predetermined value #1 (see YES route in Step S78), the transfer IF 21 increments the common bit number by one (Step S79). Then, the updating process of the statistic information when bit width adjustment in the embodiment ends.
Next, description will now be made in relation to a data compressing process according to the embodiment with reference to a flow diagram (Step S51) illustrated in FIG. 33.
The transfer IF 11 converts a double-precision floating-point number to a single-precision floating-point number (Step S51), and the data compressing process of the embodiment ends.
Next, description will now be made in relation to a data decompressing process according to the embodiment with reference to a flow diagram (Step S41) of
The transfer IF 11 converts a single-precision floating-point number to a double-precision floating-point number (Step S41), and the data decompressing process of the embodiment ends.
Next, description will now be made in relation to a data compressing process of the first modification with reference to a flow diagram (Steps S56 to S58) of
The transfer IF 11 determines whether the upper n bits of all data in the cache line are common (Step S56).
If the upper n bits of all data in the cache line are common (see YES route of Step S56), the transfer IF 11 combines the compression flag ‘1’, the n bits of the common part, and the lower bits of the respective data, and returns the combined data (Step S57). Then, the data compressing process of the first modification ends.
On the other hand, if the upper n bits of all data in the cache line are not common (see NO route of Step S56), the transfer IF 11 combines the compression flag ‘0’ and all data of the cache line and returns the combined data (Step S58). Then, the data compressing process of the first modification ends.
Next, description will now be made in relation to a data decompressing process of the first modification with reference to a flow diagram (Steps S46 to S49) of
The transfer IF 21 determines whether the leading bit is “1” (Step S46).
If the leading bit is not ‘1’ (see NO route of Step S46), the transfer IF 21 deletes the leading bit and returns the remainder data (Step S47). Then, the data decompressing process of the first modification ends.
On the other hand, if the leading bit is ‘1’ (see YES route of Step S46), the transfer IF 21 extracts the n bits of the common part (Step S48).
The transfer IF 21 extracts the respective lower bit parts and combines the upper bit parts (Step S49). Then, the data decompressing process of the first modification ends.
Next, description will now be made in relation to a first example of a data compressing process of the second modification with reference to a flow diagram (Steps S61 to S64) of
The transfer IF 21 determines whether the upper n bits of the exponents of all data in the cache line are common (Step S61).
If the upper n bits of the exponents of all data in the cache line are not common (see NO route of Step S61), the transfer IF 21 extracts the 11 bits of the exponent and the upper 10 bits of the mantissa for all data of the cache line and combines the compression flag ‘0’ and the extracted data (Step S62). Then, the first example of the data compressing process of the second modification ends.
On the other hand, if the upper n bits of the exponents of all data in the cache line are common (see YES route of Step S61), the transfer IF 21 combines the compression flag ‘1’ and the n bits of the common part (Step S63).
The transfer IF 21 extracts the lower bits of the exponent of each data, extracts upper 10 bits of the mantissa, and combines the extracted data (Step S64). Then, the first example of the data compressing process of the second modification ends.
Next, description will now be made in relation to a first example of a data decompressing process of the second modification with reference to a flow diagram (Steps S31 to S35) of
The transfer IF 21 determines whether the leading bit is “1” (Step S31).
If the leading bit is not “1” (see NO route of Step S31), the transfer IF 21 deletes the leading bit (Step S32). Then, the process proceeds to Step S35.
On the other hand, if the leading bit is “1” (see YES route of Step S31), the transfer IF 21 extracts the n bits of the common part (Step S33).
The transfer IF 21 extracts the lower bit part of each exponent and combines the upper bit parts (Step S34).
The transfer IF 21 combines zeros to server as the lower 42 bits of the mantissa of each data (Step S35). Then, the first example of data decompression process as the first modification ends.
Next, description will now be made in relation to a second example of a data compressing process of the second modification with reference to a flow diagram (Steps S66 to S69) of
the transfer IF 21 determines whether the upper n bits of the exponents of all data in the cache line are common (Step S66).
If the upper n bits of the exponents of all data in the cache line are not common (see NO route of Step S66), the transfer IF 21 is combines the compression flag ‘0’ to the leading bit (Step S67). Then, the second example of the compressing process of the second modification ends.
On the other hand, if the upper n bits of the exponents of all data in the cache line are common (see YES route of Step S66), the transfer IF 21 combines the compression flag ‘1’ and the n bits of the common part (Step S68).
The transfer IF 21 extracts the lower bits of the exponent of each data, extracts upper 10 bits of the mantissa, and combines the extracted data (Step S69). Then, the second example of the compressing process of the second modification ends.
Next, description will now be made in relation to a second example of a data decompressing process of the second modification with reference to a flow diagram (Steps S36 to S39) of
The transfer IFs 11 and 21 determine whether the leading bit is “1” (Step S36).
If the leading bit is not “1” (see NO route of Step S36), the transfer IF 21 deletes the leading bit (Step S37). Then, the second example of data decompressing process as the second modification ends.
On the other hand, if the leading bit is ‘1’ (see YES route of Step S36), the transfer IF 21 extracts the n bits of the common part (Step S38).
The transfer IF 21 extracts the lower bit part of each exponent and combines the upper bit parts (Step S39). Then, the second example of data decompression process as the second modification example ends.
The data of the related example, the embodiment, the first modification, and the second modification have data sizes of 8 B, 4 B, 4 B, and 2 B, respectively. The col_ind of the related example, the embodiment, the first modification, and the second modification have data sizes of 4 B, 4 B, 2 B, and 2 B, respectively. The SpMVs in the related example, the embodiment, the first modification, and the second modification have the B/F ratios of 6, 4, 3, and 2, respectively. The execution efficiencies in the related example, the embodiment, the first modification, and the second modification are 6.7%, 10.0%, 13.3%, and 20.0%, respectively. The performance improvement rates of the embodiment, the first modification example, and the second modification to the related example are 1.5 times, 2 times at maximum, 3 times at maximum, respectively.
The arithmetic processing apparatus and the method for memory access the above embodiment can bring the following effects.
In the data transfer between the main memory 1 and the memory controller 2, the transfer IF 21 refers to the attribute of the compressing scheme for data on main memory 1, and performs data transfer by switching the compressing scheme based on the referred attribute. This makes it possible to accelerate the memory accessing.
The transfer IF 21 switches the compressing scheme on the basis of the result of comparison of the predetermined threshold with the value of the threshold register 22 provided in the memory controller 2. This makes it possible to efficiently switch the compressing scheme.
The transfer IF 21 divides each of the cache lines of data into an upper part having a predetermined bit number and a lower part. When the upper parts of the respective cache line are common, the transfer IF 21 combines the common upper part and the lower parts of the respective data words, and executes the data transfer on the combined data. This makes it possible to efficiently compress the data to be transferred.
The transfer IF 21 divides the exponent of the floating-point data into an upper part having a predetermined bit number and a lower part. When the upper parts of the respective cache line are common, the transfer IF 21 combines the common upper part and the lower parts in a unit of a data block, and executes the data transfer on the combined data. This can efficiently compress floating-point data to be transferred.
The transfer IF 21 changes the bit number of the upper part according to the success ratio of data compressing. This makes it possible to efficiently compress the data to be transferred.
The transfer IF 21 truncates a given lower bits in the mantissa of the floating-point data, and transfers the remaining data. This makes it possible to accelerate the memory accessing.
The disclosed techniques are not limited to the embodiment described above, and may be variously modified without departing from the scope of the present embodiment. The respective configurations and processes of the present embodiment can be selected, omitted, and combined according to the requirement.
In one aspect, memory access can be accelerated.
In the claims, the indefinite article “a” or “an” does not exclude a plurality.
All examples and conditional language recited herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present inventions have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2022-031938 | Mar 2022 | JP | national |