COMPUTATION APPARATUS, METHOD, SYSTEM, CIRCUIT, AND DEVICE, AND CHIP

TECHNICAL FIELD

This application relates to the computer field, and in particular, to a computation apparatus, method, system, circuit, and device, and a chip.

BACKGROUND

Vector computation is an important computation type in different application scenarios such as artificial intelligence, scientific computing, and graph computing. An element value in a vector may include two types of values, namely, a zero element value and a non-zero element value. When there are a large quantity of zero element values in the vector, to save storage space, only the non-zero element value in the vector may be stored. In other words, the vector is compressed, and a vector in a compressed format is stored.

In a current technology, a common method for computing the vector in the compressed format is that the vector in the compressed format needs to be decompressed first, in other words, the vector in the compressed format needs to be converted into a vector in an uncompressed format, and then vector computation is performed on the vector in the uncompressed format. In a vector computation process, because a decompression operation needs to be performed on the vector in the compressed format, and decompressed data occupies very large memory space, a computation speed of the vector is limited by an access bandwidth of a memory. When the access bandwidth of the memory is fixed, the computation speed of the vector cannot be increased, resulting in low computation efficiency.

SUMMARY

Embodiments of this application provide a computation apparatus, method, system, circuit, and device, and a chip, to directly compute a vector in a compressed format without decompressing the vector in the compressed format, so that efficiency of computing a vector in the compressed format can be improved.

According to a first aspect, an embodiment of this application provides a computation apparatus, including a position coordinate comparison circuit and a logical operation circuit. The position coordinate comparison circuit is configured to compare position coordinates of an element value in a first vector with position coordinates of an element value in a second vector, to obtain a first coordinate comparison result. Both the first vector and the second vector are vectors in a compressed format. The first vector includes a first element value and first position coordinates of the first element value. The second vector includes a second element value and second position coordinates of the second element value. The first coordinate comparison result includes a first comparison result. The first comparison result indicates that the first position coordinates are the same as the second position coordinates. The logical operation circuit is configured to compute the first element value and the second element value based on the first comparison result, to obtain a computation value; and output a computation result to a cache. The cache is configured to cache the computation result. The computation result is related to the computation value. In this embodiment, the computation apparatus can compute a vector in the compressed format. When computing two vectors in the compressed format, the computation apparatus compares position coordinates of element values in the two vectors, and adds two element values corresponding to same position coordinates in the two vectors, to obtain a computation result of computing the two vectors in the compressed format. In comparison with a conventional method in which the vector in the compressed format needs to be decompressed first, and then vector computation is performed on a decompressed vector, the computation apparatus provided in this embodiment of this application can effectively improve efficiency of computing the vector in the compressed format.

In an optional implementation, the logical operation circuit includes an accumulator. The position coordinate comparison circuit is further configured to receive an addition instruction, and transmit the first comparison result to the accumulator based on the addition instruction. The accumulator is configured to add the first element value and the second element value based on the first comparison result, to obtain a sum of the first element value and the second element value; and output the sum to the cache. The computation result includes a third element value and third position coordinates of the third element value. The third element value is the sum. The third position coordinates are the same as the first position coordinates. The computation result is a result vector obtained by adding the first vector and the second vector. In this embodiment, the computation apparatus may perform addition operation on the vector in the compressed format, and the computation apparatus may be used in an application scenario of vector addition computation.

In an optional implementation, the accumulator is further configured to: when the third element value is a zero element value, output an invalid signal for the zero element value. The invalid signal indicates that an element value in a computation result does not include the zero element value and position coordinates corresponding to the zero element value. In this embodiment, each third element value is obtained by adding two element values. In this case, the third element value may be the zero element value. The accumulator is further configured to: when the third element value is the zero element value, delete the zero element value and the position coordinates corresponding to the zero element value, so that the computation result output by the computation apparatus does not include the zero element value, to output a vector in the compressed format. Therefore, a transmission resource is saved, or a next computation operation is facilitated.

In an optional implementation, the accumulator skips, based on the invalid signal, outputting the zero element value and the position coordinates corresponding to the zero element value to the cache, so that the cache does not cache the zero element value and the position coordinates corresponding to the zero element value, to achieve an objective that the computation result does not include the zero element value.

In an optional implementation, the first coordinate comparison result further includes a second comparison result. The first vector includes a fourth element value and fourth position coordinates of the fourth element value. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second vector. The accumulator is further configured to output the fourth element value and the fourth position coordinates to the cache based on the second comparison result. The computation result includes the fourth element value and the fourth position coordinates. In this embodiment, the accumulator reserves position coordinates that fail to be matched in the two vectors and an element value corresponding to the position coordinates, and directly writes the position coordinates and the element value into the cache, to use the fourth element value as an element value in the computation result. Even if a plurality of position coordinates in the first vector do not completely match a plurality of position coordinates in the second vector one by one, the computation apparatus can still perform addition operation, and a vector computation scenario is added.

In an optional implementation, the logical operation circuit includes a multiplier. The position coordinate comparison circuit is further configured to receive a multiplication instruction, and transmit the first comparison result to the multiplier based on the multiplication instruction. The multiplier is configured to multiply the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result includes a fifth element value and fifth position coordinates of the fifth element value. The fifth element value is the product. The fifth position coordinates are the same as the first position coordinates. In this embodiment, the computation apparatus may perform multiplication operation on the vector in the compressed format, and the computation apparatus may be used in an application scenario of vector multiplication computation.

In an optional implementation, the logical operation circuit includes an inner product operation circuit. The position coordinate comparison circuit is further configured to receive an inner product instruction, and transmit the first comparison result to the inner product operation circuit based on the inner product instruction. The inner product operation circuit is configured to multiply the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result is an accumulated value of a plurality of products. Each product is a product of one first element value and one second element value. A computation result of an inner product of the first vector and the second vector is a scalar (a value). In this embodiment, the computation apparatus may perform inner product operation on the vector in the compressed format, and the computation apparatus may be used in an application scenario of vector inner product computation.

In an optional implementation, the logical operation circuit includes a multiplier and an accumulator, and the first coordinate comparison result further includes a third comparison result. The position coordinate comparison circuit is further configured to receive a multiplication-addition computation instruction, and transmit the first comparison result to the multiplier based on the multiplication-addition computation instruction. The multiplier is further configured to multiply the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value, so as to obtain a fifth element value and fifth position coordinates of the fifth element value. The fifth position coordinates are the same as the first position coordinates. The position coordinate comparison circuit is further configured to compare sixth position coordinates with the fifth position coordinates, to obtain the third comparison result; and transmit the third comparison result to the accumulator. The third comparison result indicates that the sixth position coordinates are the same as the fifth position coordinates. The sixth position coordinates are position coordinates in a third vector. The third vector includes a sixth element value and the sixth position coordinates corresponding to the sixth element value. The accumulator is configured to add the sixth element value and the fifth element value based on the third comparison result, to obtain a sum of the sixth element value and the fifth element value. The computation value includes the product of the first element value and the second element value and the sum of the sixth element value and the fifth element value. The computation result includes a seventh element value and seventh position coordinates corresponding to the seventh element value. The seventh element value is the sum of the sixth element value and the fifth element value. The seventh position coordinates are the same as the sixth position coordinates. In this embodiment, the computation apparatus may perform multiplication-addition operation on the vector in the compressed format, and the computation apparatus may be used in an application scenario of vector multiplication-addition computation.

In an optional implementation, the first coordinate comparison result further includes a fourth comparison result. The third vector includes an eighth element value and eighth position coordinates of the eighth element value. The fourth comparison result indicates that no position coordinates that are the same as the eighth position coordinates are found in the third vector. The accumulator is further configured to output the eighth element value and the eighth position coordinates to the cache based on the fourth comparison result. The computation result includes the eighth element value and the eighth position coordinates. In this embodiment, the accumulator reserves position coordinates that fail to be matched in the two vectors and an element value corresponding to the position coordinates, and directly writes the position coordinates and the element value into the cache, to use the eighth element value as an element value in the computation result. Even if a plurality of position coordinates in the third vector do not completely match a plurality of position coordinates in a result vector (to be specific, a result vector obtained after the first vector is multiplied by the second vector) one by one, the computation apparatus can still perform addition operation, and a vector computation scenario is added.

In an optional implementation, the position coordinate comparison circuit is further configured to compare position coordinates of an element value in a first matrix with position coordinates of an element value in a second matrix, to obtain a second coordinate comparison result. The first matrix includes the first vector. The second matrix includes the second vector. Both the first matrix and the second matrix are matrices in a compressed format. The second coordinate comparison result includes the first coordinate comparison result. In this embodiment, a matrix in the compressed format may be split into a plurality of vectors in the compressed format. In this case, one matrix in the compressed format may be considered as a plurality of vectors in the compressed format. Therefore, the foregoing plurality of types of operations for the vector in the compressed format may be extended to computation of the matrix in the compressed format. The computation apparatus may compute two matrices in the compressed format, so that an application scenario of the computation apparatus is added, and the efficiency of computing the vector in the compressed format is effectively improved.

The column coordinate comparison circuit is configured to compare a column coordinate of each element value in the m^throw with a column coordinate of each element value in the f^throw based on the row comparison result, to obtain a column comparison result. The column comparison result indicates that an n^thcolumn coordinate of the m^throw is the same as an l^thcolumn coordinate of the f^throw. The first comparison result includes the row comparison result and the column comparison result. The accumulator is further configured to add, based on the first comparison result, an element value corresponding to the n^thcolumn coordinate of the m^throw and an element value corresponding to the l^thcolumn coordinate of the f^throw, to obtain the third element value. The element value corresponding to the n^thcolumn coordinate of the m^throw is the first element value. The element value corresponding to the l^thcolumn coordinate of the f^throw is the second element value. n is less than or equal to N, and l is less than or equal to L. In this embodiment, the position coordinate comparison circuit does not need to traverse and match all position coordinates in the first matrix with all position coordinates in the second matrix. A row coordinate in the first matrix and a row coordinate in the second matrix are first compared by using the row coordinate comparison circuit, and then column coordinates in position coordinates with a same row coordinate are compared, so that a quantity of times for which the position coordinates are compared is reduced, and a computing resource is saved.

In an optional implementation, the row comparison result includes a first signal and a first value. The column comparison result includes a second signal and a second value. The first signal indicates validity of the first value. The first value is equal to a value of the row coordinate of the m^throw. The second signal indicates validity of the second value. The second value is equal to a value of the n^thcolumn coordinate.

According to a second aspect, an embodiment of this application provides a computation method, where the method is applied to a computation apparatus. The method includes:

- obtaining a computation instruction, where the computation instruction includes a first vector and a second vector that are in a compressed format; comparing position coordinates of an element value in the first vector with position coordinates of an element value in the second vector, to obtain a first coordinate comparison result, where the first vector includes a first element value and first position coordinates of the first element value, the second vector includes a second element value and second position coordinates of the second element value, the first coordinate comparison result includes a first comparison result, and the first comparison result indicates that the first position coordinates are the same as the second position coordinates; and computing the first element value and the second element value based on the first comparison result, to obtain a computation value; and outputting a computation result of the first vector and the second vector to a cache, where the computation result is related to the computation value.

In an optional implementation, the computation instruction is an addition instruction. The computing the first element value and the second element value based on the first comparison result, to obtain a computation value may include: adding the first element value and the second element value based on the first comparison result, to obtain a sum of the first element value and the second element value. The computation value is the sum. The computation result includes a third element value and third position coordinates of the third element value. The third element value is the sum. The third position coordinates are the same as the first position coordinates.

In an optional implementation, the method further includes: when the third element value is a zero element value, outputting an invalid signal for the zero element value. The invalid signal indicates that an element value in the computation result does not include the zero element value and position coordinates corresponding to the zero element value.

In an optional implementation, the method further includes: skipping, based on the invalid signal, outputting the zero element value and the position coordinates corresponding to the zero element value to the cache.

In an optional implementation, the computation instruction is a multiplication instruction. The computing the first element value and the second element value based on the first comparison result, to obtain a computation value may include: multiplying the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result includes a fifth element value and fifth position coordinates of the fifth element value. The fifth element value is the product. The fifth position coordinates are the same as the first position coordinates.

In an optional implementation, the computation instruction is an inner product instruction. The computing the first element value and the second element value based on the first comparison result, to obtain a computation value may include: multiplying the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result is an accumulated value of a plurality of products.

In an optional implementation, the computation instruction is a multiplication-addition instruction. The first coordinate comparison result further includes a third comparison result. The computing the first element value and the second element value based on the first comparison result, to obtain a computation value includes:

- multiplying the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value, where the computation value is the product, the product is used as a fifth element value, and fifth position coordinates corresponding to the fifth element value are the same as the first position coordinates; comparing sixth position coordinates with the fifth position coordinates, to obtain the third comparison result; and transmitting the third comparison result to an accumulator, where the third comparison result indicates that the sixth position coordinates are the same as the fifth position coordinates, the sixth position coordinates are position coordinates in a third vector, and the third vector includes a sixth element value and the sixth position coordinates corresponding to the sixth element value; and
- adding the sixth element value and the fifth element value based on the third comparison result, to obtain a sum of the sixth element value and the fifth element value, where the computation value includes the sum of the sixth element value and the fifth element value, the computation result includes a seventh element value and seventh position coordinates corresponding to the seventh element value, the seventh element value is the sum of the sixth element value and the fifth element value, and the seventh position coordinates are the same as the sixth position coordinates.

In an optional implementation, the computation instruction includes a first matrix and a second matrix that are in the compressed format. The first matrix includes the first vector. The second matrix includes the second vector. The comparing position coordinates of an element value in the first vector with position coordinates of an element value in the second vector, to obtain a first coordinates comparison result may include: comparing position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a second coordinate comparison result. The second coordinate comparison result includes the first coordinate comparison result.

In an optional implementation, a dimension of the first matrix is M×N, and a dimension of the second matrix is K×L. The comparing position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a second coordinate comparison result may include: comparing a row coordinate of an m^throw in the first matrix with a row coordinate of an f^throw in the second matrix, to obtain a row comparison result, where the row comparison result indicates that the row coordinate of the m^throw is the same as the row coordinate of the f^throw, m is less than or equal to M, and f is less than or equal to K; comparing a column coordinate of each element value in the m^throw with a column coordinate of each element value in the f^throw based on the row comparison result, to obtain a column comparison result, where the column comparison result indicates that an n^thcolumn coordinate of the m^throw is the same as an l^thcolumn coordinate of the f^throw, and the first comparison result includes the row comparison result and the column comparison result; and adding, based on the first comparison result, an element value corresponding to the n^thcolumn coordinate of the m^throw and an element value corresponding to the l^thcolumn coordinate of the f^throw, to obtain the third element value, where the element value corresponding to the n^thcolumn coordinate of the m^throw is the first element value, the element value corresponding to the l^thcolumn coordinate of the f^throw is the second element value, n is less than or equal to N, and l is less than or equal to L.

According to a third aspect, an embodiment of this application provides a computation apparatus. The computation apparatus includes a position coordinate comparison circuit and an accumulator. The position coordinate comparison circuit is configured to compare position coordinates of an element value in a first matrix with position coordinates of an element value in a second matrix, to obtain a coordinate comparison result. Both the first matrix and the second matrix are matrices in a compressed format. The first matrix includes a first element value and first position coordinates of the first element value. The first element value is any element value in the first matrix. The second matrix includes a second element value and second position coordinates of the second element value. The second element value is any element value in the second matrix. The coordinate comparison result includes a first comparison result. The first comparison result indicates that the first position coordinates are the same as the second position coordinates. The accumulator is configured to add the first element value and the second element value based on the first comparison result, to obtain a sum of the first element value and the second element value; and output the sum to a cache. The cache is configured to cache a result matrix. The result matrix includes a third element value and third position coordinates of the third element value. The third element value is the sum of the first element value and the second element value. The third position coordinates are the same as the first position coordinates. In this embodiment, the computation apparatus can compute a matrix in the compressed format. When performing addition operation on two matrices in the compressed format, the computation apparatus compares position coordinates of element values in the two matrices, and adds two element values corresponding to same position coordinates in the two matrices, to obtain a result matrix of computing the two matrices in the compressed format. In comparison with a conventional method in which the matrix in the compressed format needs to be decompressed first, and then matrix computation is performed on a decompressed matrix, the computation apparatus provided in this embodiment of this application can effectively improve efficiency of computing the matrix in the compressed format.

In an optional implementation, the accumulator is further configured to: when the third element value is a zero element value, output an invalid signal for the zero element value. The invalid signal indicates that an element value in the result matrix does not include the zero element value and position coordinates corresponding to the zero element value. In this embodiment, each third element value is obtained by adding two element values. In this case, the third element value may be the zero element value. The accumulator is further configured to: when the third element value is the zero element value, delete the zero element value and the position coordinates corresponding to the zero element value, so that the result matrix output by the computation apparatus does not include the zero element value, to output a matrix in the compressed format. Therefore, a transmission resource is saved, or a next computation operation is facilitated.

In an optional implementation, the accumulator skips, based on the invalid signal, outputting the zero element value and the position coordinates corresponding to the zero element value to the cache.

In an optional implementation, the coordinate comparison result further includes a second comparison result. The first matrix includes a fourth element value and fourth position coordinates of the fourth element value. The fourth element value is any element value in the first matrix. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second matrix. The accumulator is further configured to output the fourth element value and the fourth position coordinates to the cache based on the second comparison result. The cache is configured to cache the result matrix. The result matrix includes the fourth element value and the fourth position coordinates. In this embodiment, the accumulator reserves position coordinates that fail to be matched in the two matrices and an element value corresponding to the position coordinates, and directly writes the position coordinates and the element value into the cache, to use the fourth element value as an element value in the result matrix. Even if a plurality of position coordinates in the first matrix do not completely match a plurality of position coordinates in the second matrix one by one, the computation apparatus can still perform addition operation, and a matrix computation scenario is added.

In an optional implementation, the position coordinate comparison circuit is further configured to learn, through reading, that both the first matrix and the second matrix include position coordinates, and perform, based on triggering of the position coordinates, an operation of comparing the position coordinates in the first matrix with the position coordinates in the second matrix. In this embodiment, the position coordinate comparison circuit receives the first matrix and the second matrix. Because both the first matrix and the second matrix include position coordinates, when the position coordinate comparison circuit learns, through reading, that both the first matrix and the second matrix include position coordinates, the operation of comparing, by the position coordinate comparison circuit, the position coordinates in the first matrix with the position coordinates in the second matrix is triggered.

In an optional implementation, the position coordinate comparison circuit includes a row coordinate comparison circuit and a column coordinate comparison circuit. A dimension of the first matrix is M×N, and a dimension of the second matrix is K×L. M×N may be the same as or different from K×L. The row coordinate comparison circuit is configured to compare a row coordinate of an m^throw in the first matrix with a row coordinate of an f^throw in the second matrix, to obtain a row comparison result. The row comparison result indicates that the row coordinate of the m^throw is the same as the row coordinate of the f^throw. m is less than or equal to M, and f is less than or equal to K. The column coordinate comparison circuit is configured to compare a column coordinate of each element value in the m^throw with a column coordinate of each element value in the f^throw based on the row comparison result, to obtain a column comparison result. The column comparison result indicates that an n^thcolumn coordinate of the m^throw is the same as an l^thcolumn coordinate of the f^throw. The first comparison result includes the row comparison result and the column comparison result. The accumulator is further configured to add, based on the column comparison result, an element value corresponding to the n^thcolumn coordinate of the m^throw and an element value corresponding to the l^thcolumn coordinate of the f^throw, to obtain the third element value. The element value corresponding to the n^thcolumn coordinate of the m^throw is the first element value. The element value corresponding to the l^thcolumn coordinate of the f^throw is the second element value. n is less than or equal to N, and l is less than or equal to L. In this embodiment, the position coordinate comparison circuit does not need to traverse and match all position coordinates in the first matrix with all position coordinates in the second matrix. A row coordinate in the first matrix and a row coordinate in the second matrix are first compared by using the row coordinate comparison circuit, and then column coordinates in position coordinates with a same row coordinate are compared, so that a quantity of times for which the position coordinates are compared is reduced, and a computing resource is saved.

According to a fourth aspect, an embodiment of this application provides a matrix computation method. The method is applied to a computation apparatus. The method includes: A computation apparatus first obtains a computation instruction. The computation instruction includes a first matrix and a second matrix that are in a compressed format. The computation apparatus then compares position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a coordinate comparison result. The first matrix includes a first element value and first position coordinates of the first element value. The second matrix includes a second element value and second position coordinates of the second element value. The coordinate comparison result includes a first comparison result. The first comparison result indicates that the first position coordinates are the same as the second position coordinates. Finally, the computation apparatus adds the first element value and the second element value based on the first comparison result, and outputs an obtained sum to a cache. The cache is configured to cache a result matrix. The result matrix includes a third element value and third position coordinates of the third element value. The third element value is the sum of the first element value and the second element value. The third position coordinates are the same as the first position coordinates.

In an optional implementation, the method further includes: When the third element value is a zero element value, the computation apparatus may output an invalid signal for the zero element value. The invalid signal indicates that an element value in the result matrix does not include the zero element value.

In an optional implementation, the method further includes: skipping, based on the invalid signal, outputting the zero element value and position coordinates corresponding to the zero element value to the cache, so that the result matrix does not include the zero element value. The result matrix is a matrix in the compressed format. Therefore, a transmission resource is saved, or a next computation operation is facilitated.

In an optional implementation, the coordinate comparison result further includes a second comparison result. The first matrix includes a fourth element value and fourth position coordinates of the fourth element value. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second matrix. The method may further include: The computation apparatus outputs the fourth element value and the fourth position coordinates to the cache based on the second comparison result. The result matrix includes the fourth element value and the fourth position coordinates. In this embodiment, the accumulator reserves position coordinates that fail to be matched in the two matrices and an element value corresponding to the position coordinates, and directly writes the position coordinates and the element value into the cache, to use the fourth element value as an element value in the result matrix. Even if a plurality of position coordinates in the first matrix do not completely match a plurality of position coordinates in the second matrix one by one, the computation apparatus can still perform addition operation, and a matrix computation scenario is added.

In an optional implementation, a dimension of the first matrix is M×N, and a dimension of the second matrix is K×L. That the computation apparatus compares position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a coordinate comparison result may specifically include: The computation apparatus first compares a row coordinate of an m^throw in the first matrix with a row coordinate of an f^throw in the second matrix, to obtain a row comparison result. The row comparison result indicates that the row coordinate of the m^throw is the same as the row coordinate of the f^throw. m is less than or equal to M, and f is less than or equal to K. The computation apparatus then compares a column coordinate of each element value in the m^throw with a column coordinate of each element value in the f^throw based on the row comparison result, to obtain a column comparison result. The column comparison result indicates that an n^thcolumn coordinate of the m^throw is the same as an l^thcolumn coordinate of the f^throw. The computation apparatus adds, based on the first comparison result, an element value corresponding to the n^thcolumn coordinate of the m^throw and an element value corresponding to the l^thcolumn coordinate of the f^throw, to obtain the third element value. The element value corresponding to the n^thcolumn coordinate of the m^throw is the first element value. The element value corresponding to the l^thcolumn coordinate of the f^throw is the second element value. n is less than or equal to N, and l is less than or equal to L.

In an optional implementation, the row comparison result includes a first signal and a first value. The column comparison result includes a second signal and a second value. The first signal indicates that the row coordinate of the m^throw is the same as the row coordinate of the f^throw in the second matrix. The first value is equal to the row coordinate of the m^throw. The second signal indicates that the n^thcolumn coordinate of the m^throw is the same as the l^thcolumn coordinate of the f^throw. The second value is equal to the n^thcolumn coordinate.

According to a fifth aspect, a computation circuit is provided. The computation circuit is configured to perform an operation step of the computation method provided in any one of the second aspect or the possible implementations of the second aspect. Alternatively, the computation circuit is configured to perform an operation step of the computation method provided in any one of the fourth aspect or the possible implementations of the fourth aspect.

According to a sixth aspect, a computation system is provided. The system includes a processor and a computation apparatus. The processor is configured to send a computation instruction to the computation apparatus. The computation apparatus is configured to perform an operation step of the computation method provided in any one of the second aspect or the possible implementations of the second aspect. Alternatively, the computation apparatus is configured to perform an operation step of the computation method provided in any one of the fourth aspect or the possible implementations of the fourth aspect.

According to a seventh aspect, a chip is provided. The chip includes a processor. A computation apparatus is integrated into the processor. The computation apparatus is configured to perform an operation step of the computation method provided in any one of the second aspect or the possible implementations of the second aspect. Alternatively, the computation apparatus is configured to perform an operation step of the computation method provided in any one of the fourth aspect or the possible implementations of the fourth aspect.

According to an eighth aspect, a computation device is provided. The computation device includes the computation system in the sixth aspect or the chip in the seventh aspect.

According to a ninth aspect, a readable storage medium is provided. The readable storage medium stores instructions. When the readable storage medium runs on a device, the device is enabled to perform an operation step of the computation method provided in any one of the second aspect or the possible implementations of the second aspect, or the device is enabled to perform an operation step of the computation method provided in any one of the fourth aspect or the possible implementations of the fourth aspect.

According to a tenth aspect, a computer program product is provided. When the computer program product runs on a computer, the computer is enabled to perform an operation step of the computation method provided in any one of the second aspect or the possible implementations of the second aspect, or the computer is enabled to perform an operation step of the computation method provided in any one of the fourth aspect or the possible implementations of the fourth aspect.

It may be understood that any computation apparatus, computer storage medium, or computer program product provided above is configured to perform the corresponding method provided above. Therefore, for beneficial effects that can be achieved by the computation apparatus, computer storage medium, or computer program product, refer to beneficial effects in the corresponding method provided above. Details are not described herein again.

In this application, based on the implementations in the foregoing aspects, the implementations may be combined to provide more implementations.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1a is a schematic diagram of a matrix in a COO compressed format according to an embodiment of this application;

FIG. 1b is a schematic diagram of a matrix in a CSR compressed format according to an embodiment of this application;

FIG. 1c is a schematic diagram of a matrix in a CSC compressed format according to an embodiment of this application;

FIG. 1d is a schematic diagram of splitting a matrix in a compressed format into a vector in the compressed format according to an embodiment of this application;

FIG. 2 is a schematic diagram of a structure of a computation device according to an embodiment of this application;

FIG. 3 is a schematic diagram of a structure of a processor according to an embodiment of this application;

FIG. 4 is a schematic diagram of a structure of an embodiment of a computation apparatus according to an embodiment of this application;

FIG. 5 is a schematic diagram of obtaining a first coordinate comparison result by a position coordinate comparison circuit according to an embodiment of this application;

FIG. 6 is a schematic diagram of a structure of an embodiment of a logical operation circuit according to an embodiment of this application;

FIG. 7 and FIG. 8 are schematic diagrams of two examples of addition operation performed on a first vector and a second vector according to an embodiment of this application;

FIG. 9 is a schematic diagram of an example of multiplication operation performed on a first vector and a second vector according to an embodiment of this application;

FIG. 10 is a schematic diagram of an example of inner product operation performed on a first vector and a second vector according to an embodiment of this application;

FIG. 11 is a schematic diagram of an example of multiplication-addition operation performed on a first vector, a second vector, and a third vector according to an embodiment of this application;

FIG. 12 is a schematic diagram of a structure of another embodiment of a computation apparatus according to an embodiment of this application;

FIG. 13 is a schematic diagram of a matrix F in a compressed format according to an embodiment of this application;

FIG. 14a is a schematic diagram of a first matrix and a second matrix according to an embodiment of this application;

FIG. 14
b,
FIG. 14
c, and FIG. 14d are schematic diagrams of three examples of obtaining a result matrix by adding a first matrix and a second matrix according to an embodiment of this application;

FIG. 15 is a schematic diagram of a structure of a position coordinate comparison circuit according to an embodiment of this application;

FIG. 16a is a schematic diagram of comparing a row coordinate in a first matrix with a row coordinate in a second matrix according to an embodiment of this application;

FIG. 16b is a schematic diagram of comparing row coordinates by a row coordinate comparison circuit according to an embodiment of this application;

FIG. 17a is a schematic diagram of comparing column coordinates in two rows that have a same row coordinate and that are in a first matrix and a second matrix according to an embodiment of this application;

FIG. 17b is a schematic diagram of comparing column coordinates by a column coordinate comparison circuit according to an embodiment of this application;

FIG. 18 is a schematic diagram of a structure of an accumulator according to an embodiment of this application;

FIG. 19 is a schematic diagram of outputting, by an accumulator, an element value to a second cache based on a result of comparing column coordinates according to an embodiment of this application;

FIG. 20 is a schematic diagram of a structure of another embodiment of a computation apparatus according to an embodiment of this application;

FIG. 21 is a schematic flowchart of steps of an embodiment of a computation method according to an embodiment of this application; and

FIG. 22 is a schematic flowchart of steps of another embodiment of a computation method according to an embodiment of this application.

DESCRIPTION OF EMBODIMENTS

The following describes the technical solutions in embodiments of this application with reference to the accompanying drawings in embodiments of this application. In the specification, claims, and accompanying drawings of this application, the terms “first”, “second”, and the like are intended to distinguish between similar objects but do not necessarily indicate a specific order or sequence.

To better understand this application, related words in this application are first described by using examples.

Matrix (matrix): A matrix whose dimension is M×N is a rectangular array formed by arranging elements of M rows (rows) and N columns (columns). For example, a matrix A is shown in Equation (1), and a matrix B is shown in Equation (2).

$\begin{matrix} A = [\begin{matrix} a_{1 1} & a_{1 2} & \dots & a_{1 N} \\ a_{2 1} & a_{2 2} & \dots & a_{2 N} \\ \dots & \dots & \dots & \dots \\ a_{M 1} & a_{M 2} & \dots & a_{M N} \end{matrix}] & Equation (1) \end{matrix}$

$\begin{matrix} B = [\begin{matrix} b_{1 1} & b_{1 2} & \dots & b_{1 N} \\ b_{2 1} & b_{2 2} & \dots & b_{2 N} \\ \dots & \dots & \dots & \dots \\ b_{M 1} & b_{M 2} & \dots & b_{MN} \end{matrix}] & Equation (2) \end{matrix}$

Matrix addition and subtraction: Mutual addition and subtraction may be performed on matrices with a same dimension. Specifically, addition and subtraction are performed on elements at each position. For example, the matrix A and the matrix B each are a matrix whose dimension is m×n, and the matrix A and the matrix B are added to obtain a matrix. A matrix C is shown in Equation (3).

$\begin{matrix} C = A + B = [\begin{matrix} a_{1 1} + b_{1 1} & a_{1 2} + b_{1 2} & \dots & a_{1 N} + b_{1 N} \\ a_{2 1} + b_{2 1} & a_{2 2} + b_{2 2} & \dots & a_{2 N} + b_{2 N} \\ \dots & \dots & \dots & \dots \\ a_{M 1} + b_{M 1} & a_{M 2} + b_{M 2} & \dots & a_{M N} + b_{M N} \end{matrix}] & Equation (3) \end{matrix}$

A row vector (row vector) is a matrix whose dimension is 1×M, where M is a positive integer. For example, a row vector is shown in Equation (5).

X=[x
₁
x
₂
. . . x
_M] Equation (5)

A column vector (column vector) is a matrix whose dimension is M×1, where M is a positive integer. For example, a column vector is shown in Equation (6).

$\begin{matrix} X = [\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{M} \end{matrix}] & Equation (6) \end{matrix}$

Matrix in a compressed format: When a matrix includes a zero element value and a non-zero element value, usually, to save storage space, the non-zero element value in the matrix may be stored in a format, and the zero element value is not stored. In this process, the matrix is compressed, and a matrix obtained after compression and storage is referred to as a matrix in the compressed format. A method for compressing the matrix includes but is not limited to coordinate representation (coordinate, COO), row compression (compressed sparse row, CSR), column compression (compressed sparse column, CSC), or the like.

The following separately describes the three compression methods, namely, the COO, the CSR, and the CSC by using examples.

COO: A matrix is indicated by using a triplet. The triplet includes three values, which are separately a row number, a column number, and an element value. The row number and the column number identify a position of the element value. For example, the triplet is (a row number, a column number, and an element value), or the triplet is (an element value, a row number, and a column number). Specifically, an arrangement sequence of the three values in the triplet is not limited. For example, refer to FIG. 1a. FIG. 1a shows a matrix Y whose dimension is 4×4. The matrix includes zero element values and non-zero element values. The non-zero element values are 1, 2, 3, 4, 5, 6, 7, 8, and 9. If a position of the non-zero element value “1” is a row 0 and a column 0, a triplet is represented as (0, 0, 1). If a position of the non-zero element value “2” is the row 0 and a column 1, the triplet is represented as (0, 1, 2). If a position of the non-zero element value “3” is a row 1 and the column 1, the triplet is represented as (1, 1, 3). The element values are not all described one by one in detail herein. For example, a triplet form of the matrix in the compressed format is (0, 0, 1), (0, 1, 2), (1, 1, 3), (1, 2, 4), (2, 0, 5), (2, 2, 6), (2, 3, 7), (3, 1, 8), and (3, 3, 9).

Optionally, the matrix Y in the compressed format may be represented by using Equation (8).

Row coordinate=([0, 0, 1, 1, 2, 2, 2, 3, 3])

Column coordinate=([0, 1, 1, 2, 0, 2, 3, 1, 3])

Element value=([1, 2, 3, 4, 5, 6, 7, 8, 9]) Equation (8)

CSR: A matrix is represented by using three types of data, which are separately an element value, a column number, and a row offset. The element value and the column number in the CSR are represented in a manner similar to that of the element value and the column number in the foregoing COO manner. A difference between the CSR and the COO manner lies in that the row offset indicates a start offset position of the 1^stelement of a row in all element values. Refer to FIG. 1b. Non-zero element values in a matrix Y shown in FIG. 1b are first arranged by rows, to obtain all element values: 1, 2, 3, 4, 5, 6, 7, 8, and 9. The 1^stnon-zero element value in the 1^strow is “1”, and an offset of the element value “1” in all the element values is “0”. Similarly, the 1^stnon-zero element value in the 2^ndrow is “3”, and an offset of the element value “3” in all the element values is “2”. The 1^stnon-zero element value in the 3^rdrow is “5”, and an offset of the element value “5” in all the element values is “4”. The 1^stnon-zero element value in the 4^throw is “8”, and an offset of the element value “8” in all the element values is “7”. Finally, a total quantity (for example, “9”) of the non-zero element values in the matrix is padded at a last position of a row in which the row offsets are located.

The matrix Y in the compressed format may be represented by using Equation (9).

Row offset=([0, 2, 4, 7, 9])

Column coordinate=([0, 1, 1, 2, 0, 2, 3, 1, 3])

Element value=([1, 2, 3, 4, 5, 6, 7, 8, 9]) Equation (9)

CSC: A matrix is represented by using three types of data, which are separately an element value, a row number, and a column offset. The element value and the row number in the CSR are represented in a manner similar to that of the element value and the row number in the foregoing COO manner. A difference between the CSC and the COO manner lies in that the column offset indicates a start offset position of the 1^stelement of a column in all element values. Refer to FIG. 1c. Non-zero element values in a matrix Y shown in FIG. 1c are first arranged by columns, to obtain all element values: 1, 5, 2, 3, 8, 4, 6, 7, and 9. The 1^stnon-zero element value in the 1^stcolumn is “1”, and an offset of the element value “1” in all the element values is “0”. Similarly, the 1^stnon-zero element value in the 2^ndcolumn is “5”, and an offset of the element value “5” in all the element values is “2”. The 1^stnon-zero element value in the 3^rdcolumn is “2”, and an offset of the element value “2” in all the element values is “5”. The 1^stnon-zero element value in the 4^thcolumn is “3”, and an offset of the element value “3” in all the element values is “7”. Finally, a total quantity (for example, “9”) of the non-zero element values in the matrix is padded at a last position of a row in which the column offsets are located.

The matrix Y in the compressed format may be represented by using Equation (10).

Column offset=([0, 2, 5, 7, 9])

Row coordinate=([0, 0, 1, 1, 2, 2, 2, 3, 3])

Element value=([1, 5, 2, 3, 8, 4, 6, 7, 9]) Equation (10)

It can be learned from descriptions of the foregoing three matrix compression methods that, each element value in a matrix in a COO compression format has a corresponding row coordinate (row number) and column coordinate (column number). Each element value in a matrix in a CSR compression format has a corresponding column coordinate. Each element value in a matrix in a CSC compression format has a corresponding row coordinate.

Matrix in an uncompressed format: Refer to the matrix Y shown in FIG. 1a. A matrix in the uncompressed format includes a zero element value and a non-zero element value. It should be noted that, usually, the matrix in the compressed format is also referred to as a sparse matrix, and the matrix in the uncompressed format may also be referred to as a dense matrix.

Vector in a compressed format: The vector in the compressed format in embodiments may be a vector obtained by compressing a vector in an uncompressed format. For example, a vector {right arrow over (g)}₁in the uncompressed format is [1, 2, 0, 3, 0, 5]. The vector {right arrow over (g)}₁in the uncompressed format includes six element values. If {right arrow over (g)}₁is compressed, only non-zero element values in the vector are reserved, and a position of each non-zero element value needs to be marked by using position coordinates. In an example, the vector {right arrow over (g)}₂in the compressed format is represented by using Equation (11).

Element value=([1, 2, 3, 5])

Position coordinates=([0, 1, 4, 6]) Equation (11)

In Equation (11), the element value “1” is at a position “0”. Therefore, position coordinates corresponding to the element value “1” is “0”. Similarly, the element value “2” is at a position “1”, and position coordinates corresponding to the element value “2” is “1”. Details are not described one by one by using examples.

For another example, refer to FIG. 1d for understanding. FIG. 1d is a schematic diagram of an example of a matrix in a compressed format. A vector in the compressed format is obtained by splitting the matrix in the compressed format. The matrix in the compressed format may be split into a plurality of row vectors in the compressed format, or may be split into a plurality of column vectors in the compressed format. For example, the vector in the compressed format is a column vector. Four element values a₀₀, a₁₀, a₂₀, and a₃₀in the 1^stcolumn are respectively 1, 2, 5, and 7. Position coordinates corresponding to the element values a₀₀, a₁₀, a₂₀, and a₃₀are respectively (i₀, j₀), (i₁, j₀), (i₂, j₀), and (i₃, j₀). For example, (i₀, j₀)=(1, 1), (i₁, j₀)=(3, 1), (i₂, j₀)=(4, 1), and (i₃, j₀)=(5, 1). Because the column coordinates of all the elements in the column vector in the compressed format are the same, each element value in the vector in the compressed format may carry only a row coordinate, and the position coordinates are the row coordinate in the matrix in the compressed format. If the vector in the compressed format is a row vector, the vector in the compressed format may carry only a column coordinate, and the position coordinates are the column coordinate in the matrix in the compressed format. In an example, the vector in the compressed format is shown in Equation (12).

Element value=([1, 2, 5, 7])

Position coordinates=([1, 3, 4, 5]) Equation (12)

The vector in the compressed format is also referred to as a sparse vector, and the vector in the uncompressed format is also referred to as a dense vector.

FIG. 2 is a schematic diagram of a structure of a computation device according to an embodiment. The computation device may be a device having a computation capability, such as a terminal, a network device, or a server. Refer to FIG. 2. The computation device may include a memory 201, a processor 202, a communication interface 203, and a bus 204. The memory 201, the processor 202, and the communication interface 203 are connected to each other by using the bus 204.

The memory 201 may be configured to store data, a software program, and a module, and mainly includes a program storage area and a data storage area. The program storage area may store an operating system, a software application required for at least one function, intermediate software, and the like. The data storage area may store data created during use of the device, and the like. For example, the operating system may include a Linux operating system, a Unix operating system, or a Window operating system. The software application required for the at least one function may include an application related to artificial intelligence (artificial intelligence), an application related to high-performance computing (high-performance computing, HPC), an application related to deep learning (deep learning), an application related to scientific computing, or the like. The intermediate software may include a basic linear algebra subprogram BLAS, or the like. In a possible example, the memory 201 includes but is not limited to a static random access memory (static RAM, SRAM), a dynamic random access memory (dynamic RAM, DRAM), a synchronous dynamic random access memory (synchronous DRAM, SDRAM), or a high-speed random access memory. Further, the memory 201 may further include another non-volatile memory, for example, at least one magnetic disk storage device, a flash memory device, or another volatile solid-state storage device.

In addition, the processor 202 is configured to control and manage an operation of the computation device, for example, perform various functions of the computation device and process data by running or executing the software program and/or the module stored in the memory 201 and by invoking the data stored in the memory 201. In a possible example, the processor 202 includes but is not limited to a central processing unit (central processing unit, CPU), a network processing unit (network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), an application-specific integrated circuit (application-specific integrated circuit, ASIC), a field programmable gate array (field programmable gate array, FPGA) or another programmable logic device, a transistor logic device, a logic circuit, or any combination thereof. The processor may implement or execute various example logical blocks, modules, and circuits described with reference to content disclosed in this application. Alternatively, the processor 202 may be a combination of processors implementing a computing function, for example, a combination of one or more microprocessors, or a combination of a digital signal processor and a microprocessor.

The communication interface 203 is configured to implement communication between the computation device and an external device. The communication interface 203 may include an input interface and an output interface. The input interface may be configured to obtain a first vector (or a first matrix) and a second matrix vector (or a second matrix) that are in a compressed format and that are in the following embodiments. In some feasible embodiments, the input interface may include only one input interface, or may include a plurality of input interfaces. The output interface may be configured to output a computation result in the following embodiments. In some feasible embodiments, the computation result may be directly output by the processor, or may be first stored in the memory, and then output by the memory. In some other feasible embodiments, there may be only one output interface, or there may be a plurality of output interfaces.

The bus 204 may be a peripheral component interconnect express (Peripheral Component Interconnect Express, PCIe) bus, an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The bus 204 may be classified into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used to represent the bus in FIG. 2, but this does not mean that there is only one bus or only one type of bus.

In this embodiment, the processor 202 may include a computation apparatus. The computation apparatus may be an ASIC, an FPGA, a logic circuit, or the like. Certainly, the apparatus may alternatively be implemented by using software. This is not specifically limited in this embodiment of this application. The computation apparatus may be for vector and matrix computation related to artificial intelligence, scientific computing, graphic computing, or the like.

Further, the processor 202 may include one or more of other processing units such as a CPU, a GPU, or an NPU. As shown in FIG. 3, for example, the processor 202 includes a CPU 1 and a computation apparatus 2. The computation apparatus 2 may be integrated with the CPU 1 (where for example, the computation apparatus 2 is integrated in a SoC in which the CPU 1 is located), or may be disposed in parallel with the CPU 1 separately (where for example, the computation apparatus 2 is disposed in a form of a PCIe card). Specifically, as shown in (a) in FIG. 3 and (b) in FIG. 3, the CPU 1 may further include a controller (controller) 11, one or more arithmetic logic units (arithmetic logic units, ALUs) 12, a cache (cache) 13, a memory management unit (memory management unit, MMU) 14, and the like. In FIG. 3, an example in which the memory 201 is a dynamic random access memory DRAM is used for description.

In an embodiment of this application, a computation apparatus can compute a vector in a compressed format. When computing two vectors in the compressed format, the computation apparatus compares position coordinates of element values in the two vectors, and computes two element values corresponding to same position coordinates in the two vectors, to obtain a computation result of computing the two vectors in the compressed format. In comparison with a conventional method in which the vector in the compressed format needs to be decompressed first, and then vector computation is performed on a decompressed vector, the computation apparatus provided in this embodiment of this application can effectively improve efficiency of computing the vector in the compressed format.

An embodiment of this application provides a computation apparatus. Refer to FIG. 4. The computation apparatus includes a position coordinate comparison circuit 401 and a logical operation circuit 402. Optionally, the computation apparatus further includes a first cache 403 and a second cache 404. The first cache 403, the position coordinate comparison circuit 401, an accumulator 4021, and the second cache 404 are sequentially connected. The first cache 403 and the second cache 404 may be caches (for example, registers) in the computation apparatus. Alternatively, the first cache 403 may be the cache 13 in the central processing unit 1 shown in FIG. 3. The first cache 403 is configured to cache a first vector and a second vector, and the second cache 404 is configured to cache a computation result obtained by computing the first vector and the second vector.

The position coordinate comparison circuit 401 is configured to receive the first vector and the second vector from the first cache 403, and compare position coordinates of an element value in the first vector with position coordinates of an element value in the second vector, to obtain a first coordinate comparison result. Both the first vector and the second vector are vectors in a compressed format. The first vector includes a first element value and first position coordinates of the first element value. The second vector includes a second element value and second position coordinates of the second element value. The first coordinate comparison result includes a first comparison result. The first comparison result indicates that the first position coordinates are the same as the second position coordinates.

For example, the first vector is denoted as “{right arrow over (a)}”, the second vector is denoted as “{right arrow over (b)}”, and a length of the first vector and a length of the second vector are both T. It should be noted that the length of {right arrow over (a)} and the length of {right arrow over (b)} are not limited in this embodiment of this application. The length of {right arrow over (a)} and the length of {right arrow over (b)} may be the same, or may be different. In this embodiment, that the length of {right arrow over (a)} and the length of {right arrow over (b)} are both T is merely an example for ease of description. In addition, both {right arrow over (a)} and {right arrow over (b)} may be row vectors, or may be column vectors. This is not specifically limited. {right arrow over (a)} includes the first element value and the first position coordinates corresponding to the first element value. For example, a plurality of first element values in {right arrow over (a)} and first position coordinates corresponding to each first element value are shown in Table 1 below. A plurality of second element values in {right arrow over (b)} and second position coordinates corresponding to each second element value are shown in Table 2 below.

TABLE 1

First element value
a₀
a₁
a₂
. . .
a_T−1

First position coordinates
h₀
h₁
h₂
. . .
h_T−1

TABLE 2

Second element value
b₀
b₁
b₂
. . .
b_T−1

Second position coordinates
p₀
p₁
p₂
. . .
p_T−1

The position coordinate comparison circuit 401 compares the first position coordinates in Table 1 and the second position coordinates in Table 2. If y pairs of position coordinates in the T first position coordinates and the T second position coordinates match, y is an integer greater than or equal to 0 and less than or equal to T. FIG. 5 is a schematic diagram of comparing coordinates by the position coordinate comparison circuit 401. For example, an example in which T is 4 is used for description. Four position coordinates in {right arrow over (a)} are sequentially h₀, h₁, h₂, and h₃, and four position coordinates in {right arrow over (b)} are sequentially p₀, p₁, p₂, and p₃. The position coordinate comparison circuit 401 outputs the first comparison result to the logical operation circuit 402. The first comparison result includes a position coordinate comparison result, a third signal (which is denoted as, for example, “valid”), and a third value (which is denoted as, for example, “t”). The third signal indicates validity of the value t. The third value indicates a specific value of position coordinates. The position coordinate comparison result may be denoted as (isequal″₀. index_01, index_02), (isequal″₁. index_11, index_12), (isequal″₂. index_21, index_22), and (isequal″₃. index_31, index_32). For example, when (isequal″₀. index_01, index_02)=(1, h₀, p₀), it indicates that the position coordinates h₀in the first vector are equal to the position coordinates p₀in the second vector. The position coordinate comparison circuit 401 further outputs valid″₀=1, t₀. t0 is equal to h₀(that is, t₀is also equal to p₀). Similarly, when (isequal″₁. index_11, index_12)=(1, h₁, p₁), it indicates that the position coordinates h₁in the first vector are equal to the position coordinates p₁in the second vector. The position coordinate comparison circuit 401 further outputs valid″₁=1, t₁. t₁is equal to h₁(that is, t₁is also equal to p₁). By analogy, the position coordinate comparison results are not described one by one. valid″=1 indicates that the correspondingly output value t is valid. valid″=0 indicates that the correspondingly output value t is invalid.

It may be understood that, in FIG. 5, the position coordinate comparison circuit 401 outputs eight groups of values valid″ and values t in total, and outputs four groups of isequal″ comparison results. A reason for which the position coordinate comparison circuit 401 outputs eight groups of valid″ is that a quantity of groups of signals output by the position coordinate comparison circuit 401 and bit widths of the signals are fixed. The position coordinate comparison circuit 401 reads the T position coordinates in the first vector and the T position coordinates in the second vector. If none of the T position coordinates in the first vector matches the T position coordinates in the second vector, a maximum quantity of valid″ is 2T (namely, eight). Therefore, 2T output positions of valid″ may be reserved. A reason for which the position coordinate comparison circuit 401 outputs four groups of isequal″ comparison results is that if the T position coordinates in the first vector and the T position coordinates in the second vector all match one by one, a maximum of T groups (namely, four groups) of position coordinates all match one by one. Therefore, output positions of four isequal″ comparison results may be reserved.

Example 1: The position coordinate comparison circuit receives the following two input arrays:

- {h₀, h₁, h₂, h₃}={1, 2, 3, 4}; and
- {p₀, p₁, p₂, p₃}={1, 2, 5, 7}.

The position coordinate comparison circuit outputs the following result:

- (isequal″₀. index_01, index_02)=(1, h₀, p₀);
- (isequal″₁. index_11, index_12)=(1, h₁, p₁);
- (isequal″₂. index_21, index_22)=(0, h₂, p₂);
- (isequal″₃. index_31, index_32)=(0, h₃, p₃);
- (valid″₀, t₀)=(1, 1);
- (valid″₁, t₁)=(1, 2);
- (valid″₂, t₂)=(1, 3);
- (valid″₃, t₃)=(1, 4);
- (valid″₄, t₄)=(1, 5);
- (valid″₅, t₅)=(1, 7);
- (valid″₆, t₆)=(0, −1); and
- (valid″₇, t₇)=(0, −1).

The position coordinate comparison result may indicate a pair of position coordinates that are equal. For example, h₀and p₀are equal, and h₁and p₁are equal. The isequal″ comparison result is a final comparison result, and a comparison process is not focused. For example, valid″₃=1, t₂=h₂indicates that h₂is a valid value. In addition, with reference to the comparison result (isequal″₂. index_21, index_22)=(0, h₂, p₂), no position coordinates equal to h₂are found by the position coordinate comparison circuit 401 (and no position coordinates equal to p₂are found). It can be learned from Example 1 that, two pairs of equal position coordinates are found by the position coordinate comparison circuit 401, but no equal values are found for four position coordinates. In this case, a quantity of valid values is 6 (valid″₀to valid″₅). Values of valid″₆and valid″₇are both 0, to indicate that t₆and t₇are both invalid values (where for example, the invalid value may be represented by “−1”).

The logical operation circuit 402 is configured to compute the first element value and the second element value based on the first comparison result, to obtain a computation value; and output a computation result to the second cache 404. The computation result is related to the computation value.

In this embodiment of this application, the foregoing operation includes but is not limited to addition operation, multiplication operation, inner product operation, multiplication-addition operation, and the like. Optionally, refer to FIG. 6. The logical operation circuit 402 includes at least one of the accumulator 4021, a multiplier 4022, and an inner product operation circuit 4023. For the foregoing plurality of different types of operation, the following provides examples for description.

Example one: The addition operation ({right arrow over (a)}+{right arrow over (b)}) is described by using an example.

In an optional embodiment, the position coordinate comparison circuit 401 is further configured to receive an addition instruction, where the addition instruction includes the first vector and the second vector; and transmit the first comparison result to the accumulator 4021 based on the addition instruction.

The accumulator 4021 is configured to receive the first vector, the second vector, and the first comparison result. The first comparison result indicates that the first position coordinates are the same as the second position coordinates. For example, a first comparison result a includes (isequal″₀. index_01, index_02)=(1, h₀, p₀) and (valid″₀, t₀)=(1, 1) in Example 1. The accumulator 4021 adds the first element value and the second element value based on the first comparison result, to obtain a sum of the first element value and the second element value, and outputs the sum to the second cache 404. The second cache 404 is configured to cache the computation result (a result vector obtained by adding the first vector and the second vector). The computation result includes a third element value and third position coordinates of the third element value. The third element value is the sum. The third position coordinates are the same as the first position coordinates.

For example, refer to Table 1, Table 2, and FIG. 7 for understanding. If the first comparison result indicates that a plurality of first position coordinates in the first vector can all match a plurality of second position coordinates in the second vector one by one, for example, h₀=p₀, h₁=p₁, h₂=p₂, . . . , and h_T−1=p_T−1, the computation result of {right arrow over (a)}+{right arrow over (b)} may be shown in Table 3.

TABLE 3

Third element value
c₀
c₁
c₂
. . .
c_T−1

Third position coordinates
q₀
q₁
q₂
. . .
q_T−1

In Table 3, c₀=a₀+b₀, c₁=a₁+b₁, c₂=a₂+b₂, . . . , and c_T−1=a_T−1+b_T−1. q₀=h₀(q₀=p₀), q₁=h₁(q₁=p₁), q₂=h₂(q₂=p₂), . . . , q_T−1=h_T−1(q_T−1=p_T−1).

In FIG. 7, using an example in which Tis 4, if a₀is 10 and b₀is 2, c₀=10+2=12, and q₀=1. Examples in FIG. 7 are not described by using examples one by one herein.

Optionally, if not all the plurality of first position coordinates in the first vector match the plurality of second position coordinates in the second vector one by one, but one part of the position coordinates in {right arrow over (a)} can match equal position coordinates in {right arrow over (b)}, and the other part of the position coordinates in {right arrow over (a)} cannot match equal position coordinates in {right arrow over (b)}, the first vector includes a fourth element value and fourth position coordinates of the fourth element value. The first coordinate comparison result further includes a second comparison result. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second vector. For example, a second comparison result a includes (isequal″₂. index_21, index_22)=(0, h₂, p₂), (valid″₂, t₂)=(1, 3), and the like in Example 1.

The accumulator 4021 is further configured to output the fourth element value and the fourth position coordinates to the second cache 404 based on the second comparison result. The computation result includes the fourth element value and the fourth position coordinates. In other words, the second cache stores the third element value and the corresponding position coordinates, and the fourth element value and the corresponding position coordinates. The third element value is the sum of the first element value and the second element value, and the fourth element value is equal to the first element value or the second element value. In this embodiment, the accumulator 4021 reserves position coordinates that fail to be matched in the two vectors and an element value corresponding to the position coordinates, and directly writes the position coordinates and the element value into the second cache 404, so that each fourth element value is also used as an element value in the computation result. Even if the plurality of position coordinates in the first vector do not completely match the plurality of position coordinates in the second vector one by one, the computation apparatus can still perform addition operation, and a vector computation scenario is added.

Refer to FIG. 8. The computation result is a result obtained by sorting the position coordinates in ascending order. For example, position coordinates “1” of a first element value “10” are equal to position coordinates “1” of a second element value “2”. In this case, the accumulator 4021 outputs a third element value “12” (the first element value “10”+the second element value “2”) and corresponding third position coordinates “1” to the second cache 404. First position coordinates “2” of a first element value “7” are equal to second position coordinates “2” of a second element value “3”. In this case, the accumulator 4021 outputs a third element value “10” (the first element value “7”+the second element value “3”) and third position coordinates “2” to the second cache 404. For another example, no second position coordinates equal to position coordinates “3” of a first element value “8” are found. In this case, the accumulator 4021 reserves the first element value “8” and the corresponding position coordinates “3”, and directly transmits the first element value “8” and the corresponding position coordinates “3” to the second cache 404. Examples in FIG. 8 are not described one by one. A length of the result vector (namely, the computation result) that is of {right arrow over (a)}+{right arrow over (b)} and that is cached in the second cache is greater than or equal to T, and less than or equal to 2T.

Optionally, because each third element value is obtained by adding two element values, the third element value may be a zero element value. To save a transmission resource or facilitate a next computation operation, the computation apparatus may compress the foregoing result vector, to output a vector in the compressed format. For example, the accumulator 4021 is further configured to: when the third element value is the zero element value, output an invalid signal. The invalid signal indicates that the accumulator 4021 does not output the zero element value and position coordinates corresponding to the zero element value to the second cache 404, so that the computation result output by the computation apparatus does not include the zero element value. For example, (isequal″₀. index_01, index_02)=(1, h₀, p₀), and (valid″₀, t₀)=(1, 1). h₀=p₀.

In addition, a first element value corresponding to h₀is “1”, and a first element value corresponding to p₀is “−1”. In this case, the third element value is “0” (1+(−1)), and the accumulator 4021 set a value of valid″₀from “1” to “0” (invalid signal). The accumulator 4021 skips, based on the invalid signal (valid″₀=0), outputting the zero element value and the position coordinates corresponding to the zero element value to the second cache 404, so that the second cache 404 does not cache the zero element value.

Example two: The multiplication operation ({right arrow over (a)}×{right arrow over (b)}) is described by using an example.

In an optional embodiment, the position coordinate comparison circuit 401 is further configured to receive a multiplication instruction, where the multiplication instruction includes the first vector and the second vector; and transmit the first comparison result to the multiplier 4022 based on the multiplication instruction.

The multiplier 4022 is configured to receive the first vector, the second vector, and the first comparison result; multiply the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value; and output the product (the computation value) to the second cache 404. The computation result includes a fifth element value (for example, denoted as “d”) and fifth position coordinates (for example, denoted as “s”) of the fifth element value. The fifth element value is the product. The fifth position coordinates are the same as the first position coordinates.

With reference to Table 1 and Table 2 for understanding, if h₀=p₀, h₁=p₁, and h₃=p₂, a fifth element value d₀=a₀×b₀, and a fifth coordinate value corresponding to d₀is s₀(s₀=h₀); a fifth element value d₁=a₁×b₁, and a fifth coordinate value corresponding to d₁is s₁(s₁=h₁); and a fifth element value d₂=a₃×b₂, and a fifth coordinate value corresponding to d₂is s₂(s₂=h₃). The multiplier 4022 does not compute an element value that is in the first vector and the second vector and that corresponds to position coordinates that fail to be matched.

Refer to FIG. 9. In a specific example, the first comparison result includes three pairs of equal position coordinates. The multiplier 4022 performs, based on the first comparison result, the multiplication operation on two computation values whose position coordinates are equal. For example, position coordinates “1” of a first element value “10” are equal to position coordinates “1” of a second element value “2. In this case, the multiplier 4022 multiplies the first element value “10” by the second element value “2”, to obtain a product “20”, and outputs the product (the fifth element value) and corresponding position coordinates “1” (the fifth position coordinates) to the second cache 404. However, no position coordinates equal to first position coordinates “3” of a first element value “8” are found. In this case, the multiplier 4022 does not compute the first element value “8”. Similarly, no position coordinates equal to second position coordinates “7” of a second element value “11” are found. In this case, the multiplier 4022 does not compute the second element value “11”. Examples shown in FIG. 9 are not described by using examples one by one herein. In this embodiment, if the length of the first vector and the length of the second vector are both T, a length of the computation result of {right arrow over (a)}×{right arrow over (b)} is greater than or equal to 0, and less than or equal to T.

Example three: The inner product operation ({right arrow over (a)}·{right arrow over (b)}) is described by using an example.

In an optional embodiment, the position coordinate comparison circuit 401 is further configured to receive an inner product instruction, where the inner product instruction includes the first vector and the second vector; and transmit the first comparison result to the inner product operation circuit 4023 based on the inner product instruction.

The inner product operation circuit 4023 is configured to receive the first vector, the second vector, and the first comparison result; multiply the first element value by the second element value based on the first comparison result, to obtain a product (the computation value); and output the computation result to the second cache 404. The computation result is an accumulated value of a plurality of products. Each product is a product of a pair of a first element value and a second element value that have same position coordinates. With reference to Table 1 and Table 2 for understanding, if there are x pairs of matching position coordinates in total between the vector {right arrow over (a)} and the vector {right arrow over (b)}, for example, h₀=p₀, h₁=p₁, and h₃=p₂, the computation result of {right arrow over (a)}·{right arrow over (b)} is shown in Equation (13).

{right arrow over (a)}·{right arrow over (b)}=a
₀
×b
₀
+a
₁
×b
₁
+a
₃
×b
₂ Equation (13)

The inner product operation result of the vector {right arrow over (a)} and the vector {right arrow over (b)} is a scalar. For example, refer to FIG. 10. {right arrow over (a)}·{right arrow over (b)}=10×2+7×3+6×10=101.

Example four: The multiplication-addition operation ({right arrow over (a)}×{right arrow over (b)}+{right arrow over (z)}) is described by using an example.

In this embodiment, three vectors are involved, and the three vectors are all vectors in the compressed format, for example, the first vector {right arrow over (a)}, the second vector {right arrow over (b)}, and a third vector {right arrow over (z)}. The multiplier 4022 is configured to perform the multiplication operation on {right arrow over (a)} and {right arrow over (b)}, and the accumulator 4021 is configured to perform the addition operation on a result of {right arrow over (a)}×{right arrow over (b)} and {right arrow over (z)}.

The position coordinate comparison circuit 401 is further configured to receive a multiplication-addition computation instruction, where the multiplication-addition instruction includes the first vector and the second vector; and transmit the first comparison result to the multiplier 4022 based on the multiplication-addition computation instruction.

The multiplier 4022 is further configured to receive the first vector, the second vector, and the first comparison result; and multiply the first element value by the second element value based on the first comparison result, to obtain a product (the computation value) of the first element value and the second element value. The product is a fifth element value, and fifth position coordinates corresponding to the fifth element value are the same as the first position coordinates. The multiplier 4022 transmits the fifth computation value and the fifth position coordinates to the second cache 404.

For the multiplication operation on the vector {right arrow over (a)} and the vector {right arrow over (b)}, refer to descriptions of the multiplication operation ({right arrow over (a)}×{right arrow over (b)}) in Example two. Details are not described herein again.

The second cache 404 outputs the result (for example, denoted as a vector “{right arrow over (y)}”) of {right arrow over (a)}×{right arrow over (b)} to the first cache 403.

The position coordinate comparison circuit 401 is further configured to receive the vector {right arrow over (y)} and the vector {right arrow over (z)} from the first cache 403; compare sixth position coordinates in the vector {right arrow over (z)} with the fifth position coordinates in the vector {right arrow over (y)}, to obtain a third comparison result; and transmit the third comparison result to the accumulator 4021. The third comparison result indicates that the sixth position coordinates are the same as the fifth position coordinates.

The accumulator 4021 is configured to add a sixth element value and the fifth element value based on the third comparison result, to obtain a sum of the sixth element value and the fifth element value. The computation result includes a seventh element value and seventh position coordinates corresponding to the seventh element value. The seventh element value is the sum of the sixth element value and the fifth element value. The seventh position coordinates are the same as the sixth position coordinates.

Descriptions of the addition operation on the vector {right arrow over (y)} and the vector {right arrow over (z)} and descriptions of the addition operation on the vectors in Example one are not described herein again.

Refer to FIG. 11. The multiplication-addition operation is described as an example. In the vector {right arrow over (a)} and the vector {right arrow over (b)}, there are three pairs of position coordinates that are the same. For example, position coordinates “1” of a first element value “10” are equal to position coordinates “1” of a second element value “2”. In this case, the multiplier 4022 multiplies the first element value “10” by the second element value “2”, to obtain a product “20”; and outputs the product (the fifth element value) and corresponding position coordinates “1” (the fifth position coordinates) to the second cache 404. Similarly, position coordinates “2” of a first element value “7” are equal to position coordinates “2” of a second element value “3”. In this case, the multiplier 4022 multiplies the first element value “7” by the second element value “3”, to obtain a product “21”; and outputs the product “21” and corresponding position coordinates “2” to the second cache 404. Finally, the multiplier 4022 outputs, to the second cache 404, three fifth element values 20, 21, and 60 (6×10) and fifth position coordinates (“1”, “2”, or “5”) corresponding to each fifth element value. The second cache 404 transmits the three fifth element values and the fifth position coordinates corresponding to each element value to the first cache 403.

The position coordinate comparison circuit 401 compares position coordinates in the vector {right arrow over (y)} with position coordinates in the vector {right arrow over (z)}, and there are two pairs of position coordinates that are the same in the vector {right arrow over (y)} and the vector {right arrow over (z)} in total. For example, the fifth position coordinates “1” corresponding to the fifth element value “20” are the same as sixth position coordinates “1” corresponding to a sixth element value “7”, and the fifth position coordinates “5” corresponding to the fifth element value “60” are the same as sixth position coordinates “5” corresponding to a sixth element value “2”. In this case, the accumulator 4021 outputs, to the second cache 404, a sum “27” of the fifth element value “20” and the sixth element value “7” and position coordinates “1” corresponding to the sum; outputs, to the second cache 404, a sum “62” of the fifth element value “60” and the sixth element value “2” and position coordinates “5” corresponding to the sum; and outputs, to the second cache 404, the fifth element value “21” and the corresponding position coordinates “2”, and sixth element values “11” and “14” and corresponding sixth position coordinates “3” and “4”, where the position coordinates of the fifth element value “21” and the sixth element values “11” and “14” fail to be matched. The computation result {right arrow over (a)}×{right arrow over (b)}+{right arrow over (z)} finally cached in the second cache 404 is shown in Equation (14).

Seventh element value=([27, 21, 11, 14, 62])

Seventh position coordinates=([1, 2, 3, 4, 5]) Equation (14)

In this embodiment, the computation apparatus may perform the plurality of types of operations on the vector in the compressed format. For example, the plurality of types of operations include the addition operation, the multiplication operation, the inner product operation, and the multiplication-addition operation. When computing two vectors in the compressed format, the computation apparatus compares position coordinates of element values in the two vectors, and performs related operation on two element values corresponding to same position coordinates in the two vectors, to obtain a result vector of computing the two vectors in the compressed format. In comparison with a conventional method in which the vector in the compressed format needs to be decompressed first, and then vector computation is performed on a decompressed vector, the computation apparatus provided in this embodiment of this application can effectively improve efficiency of computing the vector in the compressed format.

Optionally, refer to FIG. 12. The computation apparatus includes a data cache and queue unit 406. The cache and queue unit 406 is separately connected to the position coordinate comparison circuit 401 and the logical operation circuit 402. When the length of the first vector and the length of the second vector are both r×T, where r is a positive integer, and the position coordinate comparison circuit 401 compares position coordinates in the first vector with position coordinates in the second vector, a maximum quantity of position coordinates that can be matched is r×T, and the minimum quantity of position coordinates that can be matched is 0. The position coordinate comparison circuit 401 outputs the first vector, the second vector, and the first coordinate comparison result to the data cache and queue unit 406, and the data cache and queue unit 406 repacks the first vector and the second vector into a vector whose length is T. Then, the vector whose length is T and the first coordinate comparison result are output to the logical operation circuit 402, and the logical operation circuit 402 performs related vector operation. In this embodiment, the computation apparatus may repack a vector by using the data cache and queue unit 406. Even if a length of an input vector is greater than a length of a vector that can be processed by the logical operation circuit, the length of the vector can adapt to a length that can be computed by the logical operation circuit, so that utilization of the computation apparatus is improved.

In the foregoing embodiments, the plurality of types of operations on a vector in the compressed format are described by using examples. A matrix in the compressed format may be split into a plurality of vectors in the compressed format. In this case, one matrix in the compressed format may be considered as a plurality of vectors in the compressed format. Therefore, the foregoing plurality of types of operations for the vector in the compressed format may be extended to computation of the matrix in the compressed format. In a specific embodiment, the computation apparatus in this application may alternatively be used in matrix computation. The following uses addition as an example to describe addition on a compressed matrix.

The position coordinate comparison circuit 401 is configured to compare position coordinates of an element value in a first matrix with position coordinates of an element value in a second matrix, to obtain a coordinate comparison result (which is also referred to as a “second coordinate comparison result” in this embodiment). Both the first matrix and the second matrix are matrices in a compressed format.

As described in the foregoing example of the matrix in the compressed format, each element value (value) in the matrix in the compressed format has corresponding position coordinates. FIG. 13 is a schematic diagram of a matrix in the compressed format. A dimension of a matrix F in the compressed format shown in FIG. 13 is M×N , and the matrix F has M rows and N columns in total. The matrix F includes M×N element values, and each element value has position coordinates. The position coordinates indicate a position of the corresponding element value in a matrix F′ that corresponds to the matrix F and that is in an uncompressed format. The position coordinates include a row coordinate and a column coordinate. It should be understood that the row coordinate indicates a “row” in which the corresponding element value is located in the matrix F′ in the uncompressed format, and the column coordinates indicates a “column” in which the corresponding element value is located in the matrix F′ in the uncompressed format. In the matrix F, the M rows sequentially include M row coordinates i₀, i₁, . . . , and i_(M−1). Each row has N columns in total. The N columns sequentially include N column coordinates j₀, j₁, . . . , and j_(N−1). Using a row 0 as an example, element values included in the row 0 sequentially include N element values a₀₀, a₀₁, . . . , and a_0(N−1). Position coordinates of all the element values in the N element values in the row 0 are sequentially (i₀, j₀), (i₀, j₁), . . . , and (i₀, j_(N−1). Similarly, N element values in a row M are sequentially a_(M−1)0, a_(M−1)1, . . . , and a_{(M−1)(N−1)}. Position coordinates of all the element value in the row M are sequentially M, (i_(M−1), j₁), . . . , and (i_(M−1), j_(N−1). For example, the position coordinates of the element value a₀₁is (i₀, j₁). If i₀=1 and j₁=1, it indicates that a position of the element value a₀₁in the matrix F′ is a row 1 and a column 1. It should be understood that the matrix in the compressed format shown in FIG. 13 is essentially the same as the compressed matrix in the COO format described above, and only a representation form is changed. For example, element values with a same row coordinate are disposed in a same row. For example, a matrix is represented in the COO format as (0, 0, 1), (1, 0, 2), (2, 1, 3), (1, 2, 4), (2, 0, 5), (2, 2, 6), (2, 3, 7), (3, 1, 8), (3, 3, 9), and (3, 0, 6). During data compression, element values with a same row coordinate are stored in a same row. For example, row coordinates in four triplets (0, 1, 3), (0, 0, 5), (0, 2, 6), and (0, 3, 7) are the same. In this case, element values in the four triplets are disposed in a same row, and element values “3”, “5”, “6”, and “7” are used as element values in the same row. Position coordinates of “3” is (0, 1), position coordinates of “5” is (0, 0), position coordinates of “6” is (0, 2), position coordinates of “7” is (0, 3), and the like.

For example, an example in which a dimension of the first matrix is M×N, a dimension of the second matrix is K×L, and compressed formats of the first matrix and the second matrix are COO is used for description. The dimension of the first matrix may be the same as or different from the dimension of the second matrix. This is not specifically limited. For ease of description, in this embodiment, an example in which M, N, K, and L are all 4 is used for description, that is, the dimensions of the first matrix and the second matrix are both 4×4. The first matrix includes a first element value and first position coordinates of the first element value, where the first element value is any element value in the first matrix. The second matrix includes a second element value and position coordinates of the second element value, where the second element value is any element value in the second matrix. For example, each of the first matrix and the second matrix includes four rows, each row includes four element values, and each matrix includes 16 element values. The first matrix is used as an example. Four element values in a row 0 are sequentially a₀₀, a₀₁, a₀₂, and a₀₃. Position coordinates of a₀₀is (i₀, j₀), position coordinates of a₀₁is (i₀, j₁), position coordinates of a₀₂is (i₀, j₂), and position coordinates of a₀₃is (i₀, j₃). Another element value in the first matrix and position coordinates corresponding to each element value are not described in detail one by one by using examples. The second matrix is used as an example. Four element values in a row 0 are sequentially b₀₀, b₀₁, b₀₂, and b₀₃. Position coordinates of b₀₀is (k₀, l₀), position coordinates of b₀₁is (k₀, l₁), position coordinates of b₀₂is (k₀, l₂), and position coordinates of b₀₃is (k₀, l₃). Another element value in the second matrix and position coordinates corresponding to each element value are not described in detail one by one by using examples.

For example, refer to FIG. 14a for understanding. A process in which the position coordinate comparison circuit 401 compares position coordinates in two matrices is described by using an example. The position coordinate comparison circuit 401 receives the first matrix and the second matrix from the first cache 403, and may first compare row coordinates in the position coordinates in the two matrices. If the row coordinates are equal (or the same), the position coordinate comparison circuit 401 further compares column coordinates corresponding to two equal row coordinates. For example, a row coordinate in the row 0 in the first matrix and a row coordinate in the row 0 in the second matrix are respectively i₀and k₀. If i₀and k₀are equal, the position coordinate comparison circuit 401 compares column coordinates (j₀, j₁, j₂, j₃) in the row 0 in the first matrix with column coordinates (l₀, l₁, l₂, l₃) in the row 0 in the second matrix. If j₀and l₀are equal, the position coordinates (i₀, j₀) of the element value a₀₀are the same as the position coordinates (k₀, l₀) of the element value b₀₀. If j₀and l₀are not equal, j₀continues to be compared with the next column coordinate l₁in the row 0 in the second matrix sequentially. That is, if two column coordinates that are currently compared are the same, it is determined that position coordinates of two element values corresponding to the column coordinates are the same. If two column coordinates that are currently compared are different, j₀continues to be compared with the next column coordinate l₂in the row 0 in the second matrix until the column coordinates in the first matrix are completely compared with all the elements in the row 0 in the second matrix. Herein, only an example in which the position coordinates in the row 0 in the first matrix are compared with the position coordinates in the row 0 in the second matrix is used for description, and comparison of other position coordinates in the two matrices is not described by using examples one by one. The coordinate comparison result includes at least the following two cases.

In a first case, the coordinate comparison result includes a first comparison result, and the first comparison result indicates that first position coordinates are the same as second position coordinates. For example, if the position coordinates (i₀, j₁) in the first matrix are (0, 2), and the position coordinates (k₀, l₀) in the second matrix are (0, 2). In this case, the position coordinates (i₀, j₁) are equal to the position coordinates (k₀, l₀). For another example, position coordinates (i₁, j₀) in the first matrix are (1, 3), and position coordinates (k₁, l₂) in the first matrix are also (1, 3). In this case, the position coordinates (i₁, j₀) are equal to the position coordinates (k₁, l₂).

The accumulator 4021 is configured to receive the first matrix and the second matrix, and add the first element value and the second element value based on the first comparison result, to obtain a result matrix. The result matrix includes a third element value and third position coordinates of the third element value. The third element value is a sum of the first element value and the second element value. The third position coordinates are the same as the first position coordinates (or the second position coordinates). The accumulator 4021 writes the third element value and the third position coordinates into the second cache 404. The second cache 404 is configured to store the result matrix. For example, refer to FIG. 14b. The accumulator 4021 adds the element value a₀₁(for example, “1”) corresponding to the position coordinates (i₀, j₁) in the first matrix and the element value b₀₀(for example, “2”) corresponding to the position coordinates (k₀, l₀) in the second matrix, to obtain a “third element value” (for example, denoted as “c₀₀”). Position coordinates of c₀₀(for example, “3”) is denoted as (u₀, v₀), and (u₀, v₀) is also (0, 2). For another example, the accumulator 4021 adds an element value a₁₀(for example, “2”) corresponding to position coordinates (i₁, j₀) in the first matrix and an element value b₁₂(for example, “5”) corresponding to position coordinates (k₁, l₂) in the second matrix, to obtain a “third element value” (for example, denoted as “c₁₀”). Position coordinates of c₁₀(for example, “7”) is denoted as (u₁, v₀), and (u₁, v₀) is also (1, 3), and the like. If all position coordinates in the first matrix can match same position coordinates in the second matrix, that is, the 16 position coordinates in the first matrix match (are equal to) the 16 position coordinates in the second matrix one by one, the accumulator 4021 adds element values having same position coordinates in the two matrices to obtain a new element value (the third element value). The third position coordinates do not change relative to the first position coordinates (or the second position coordinates). Therefore, a result matrix C1 is obtained. Each third element value in the result matrix C1 and third position coordinates corresponding to each third element value are not described by using examples one by one.

Optionally, because each third element value is obtained by adding two element values, the third element value may be a zero element value. To save a transmission resource or facilitate a next computation operation, the computation apparatus may compress the result matrix, to output a matrix in the compressed format. For example, the accumulator 4021 is further configured to: when the third element value is the zero element value, output an invalid signal. The invalid signal indicates that the zero element value and position coordinates corresponding to the zero element value are not output to the second cache 404, so that the result matrix output by the computation apparatus does not include the zero element value. Refer to FIG. 14c. For example, c₀₃, c₁₃, c₂₂, and c₃₃are all zero element values, and a result matrix C2 does not include the four zero element values and position coordinates corresponding to each zero element value.

In a second case, the coordinate comparison result further includes a second comparison result, and the first matrix includes a fourth element value and fourth position coordinates of the fourth element value. The fourth element value is any element value in the first matrix. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second vector.

The accumulator 4021 is further configured to write the fourth element value and the fourth position coordinates into the second cache 404 based on the second comparison result. The second cache 404 is configured to cache the result matrix. The result matrix includes the fourth element value and the fourth position coordinates corresponding to the fourth element value. It should be understood that, in this case, element values having same position coordinates in the two matrices are added to obtain a sum (the third element value) of the two element values, and the accumulator 4021 writes the position coordinates and the sum into the second cache 404. The accumulator 4021 reserves position coordinates that fail to be matched in the two matrices and an element value corresponding to the position coordinates, and directly writes the element value (the fourth element value) into the second cache 404, so that the fourth element value is used as an element value in a result matrix C3. For example, refer to FIG. 14d for understanding. An element value in the row 0 in the first matrix and the second matrix is used as an example for description. Position coordinates in first three columns in the row 0 in the first matrix respectively match (are the same as) position coordinates in first three columns in the row 0 in the second matrix. For example, the position coordinates (i₀, j₀) in the first matrix are the same as the position coordinates (k₀, l₀) in the second matrix. In this case, the accumulator adds a₀₀and b₀₀. c₀₀is a sum of a₀₀and b₀₀, and (u₀, v₀) are the same as (i₀, j₀). Similarly, (i₀, j₁) are the same as (k₀, l₁). In this case, the accumulator 4021 adds a₀₁and b₀₁. c₀₁is a sum of a₀₁and b₀₁, and (u₀, v₁) are the same as (i₀, j₁). (i₀, j₂) are the same as (k₀, l₂). In this case, the accumulator 4021 adds a₀₂and b₀₂. c₀₂is a sum of a₀₂and b₀₂, and (u₀, v₂) are the same as (i₀, j₂). However, position coordinates in the 4^thcolumn in the row 0 of the first matrix are different from position coordinates in the 4^thcolumn in the row 0 of the second matrix. In other words, position coordinates (i₀, j₃) in the 4^thcolumn in the row 0 of the first matrix fail to match same position coordinates in the second matrix. In this case, the accumulator 4021 writes the position coordinates (i₀, j₃) and an element value corresponding to the position coordinates into the second cache 404. To be specific, c₀₃in the result matrix C3 is the same as a₀₃in the first matrix, and position coordinates (u₀, v₃) of c₀₃are the same as position coordinates (i₀, j₃) of a₀₃. Similarly, position coordinates (k₀, l₃) in the 4^thcolumn in the row 0 of the second matrix also fail to match same position coordinates in the first matrix. In this case, the accumulator 4021 also writes the position coordinates (k₀, l₃) and an element value corresponding to the position coordinates into the second cache 404. To be specific, c₀₄in the result matrix C3 is the same as b₀₃in the second matrix, and position coordinates (u₀, v₄) of c₀₄are the same as position coordinates (k₀, l₃) of b₀₃. Similarly, other element values in the first matrix and the second matrix, and element values in the result matrix C3 are not described in detail by using examples one by one.

The following describes a structure and a function of the position coordinate comparison circuit by using an example. Refer to FIG. 15. The position coordinate comparison circuit 401 may include a row coordinate comparison circuit 4011 and a column coordinate comparison circuit 4012. The row coordinate comparison circuit 4011 is configured to determine, through comparison, row coordinates that are the same (or equal) and row coordinates that are different, to obtain a row coordinate comparison result. The column coordinate comparison circuit 4012 is configured to compare two column coordinates in two position coordinates with a same row coordinate, to obtain a column coordinate comparison result. When the row coordinates are the same, if the two column coordinates are the same, the two position coordinates corresponding to the two column coordinates are the same. If the two column coordinates are different, the two position coordinates corresponding to the two column coordinates are different. In this embodiment, the position coordinate comparison circuit does not need to traverse and match all position coordinates in the first matrix with all position coordinates in the second matrix. First coordinates (for example, row coordinates) in the first matrix and the second matrix are first compared, and then second coordinates (for example, column coordinates) in position coordinates with same first coordinates are compared, so that a quantity of times for which the position coordinates are compared is reduced, and a computing resource is saved.

It should be understood that, in this embodiment, whether the position coordinate comparison circuit 401 first compares row coordinates and then compares column coordinates or first compares column coordinates and then compares row coordinates is not limited. For example, in a manner A, the first matrix and the second matrix compress element values corresponding to a same row coordinate into a same row. Each matrix in the compressed format may be considered to include a plurality of row vectors. In this case, the position coordinate comparison circuit 401 first compares row coordinates of two row vectors in the two matrices by using the row coordinate comparison circuit 4011. If the row coordinates of the two row vectors are the same, the column coordinate comparison circuit 4012 continues to compare column coordinates of all the element values in the two row vectors. For example, in a manner B, the first matrix and the second matrix compress element values corresponding to a same column coordinate into a same column. Each matrix in the compressed format may be considered to include a plurality of column vectors. In this case, the position coordinate comparison circuit 401 first compares column coordinates of two column vectors in the two matrices by using the column coordinate comparison circuit 4012. If the column coordinates of the two column vectors are the same, the row coordinate comparison circuit 4011 continues to compare row coordinate of all the element value in the two column vectors. In this embodiment, the foregoing manner A is used as an example for description.

For example, the row coordinate comparison circuit 4011 is configured to compare a row coordinate of an m^throw in the first matrix with a row coordinate of an f^throw in the second matrix, to obtain a first row comparison result. The first row comparison result indicates that the row coordinate of the m^throw is the same as the row coordinate of the f^throw. m is an integer less than or equal to M, and f is an integer less than or equal to K. The m^throw is any row in the first matrix, and the f^throw is any row in the second matrix.

FIG. 16b is a schematic diagram of comparing row coordinates by the row coordinate comparison circuit 4011. The row coordinate comparison circuit 4011 receives the first matrix and the second matrix from the first cache. Four row coordinates in the first matrix are sequentially i₀, i₁, i₂, and i₃. Four row coordinates in the second matrix are sequentially k₀, k₁, k₂, and k₃. The row coordinate comparison circuit 4011 outputs a first row comparison result. The first row comparison result includes a row coordinate comparison result, a first signal (which is denoted as, for example, “valid”), and a first value (which is denoted as, for example, “u”). The first signal indicates validity of the value u. The first value indicates a value of a row coordinate. The first row comparison result indicates that the row coordinate of the m^throw is the same as the row coordinate of the f^throw. For example, refer to FIG. 16a. The row coordinate comparison result is denoted as (isequal₀. index_01, index_02), (isequal₁. index_11, index_12), (isequal₂. index_21, index_22), and (isequal₃. index_31, index_32). For example, when (isequal₀. index_01, index_02)=(1, i₀, k₀), it indicates that a row coordinate i₀in the row 0 in the first matrix is equal to a row coordinate k₀in the row 0 in the second matrix, and the row coordinate comparison circuit 4011 further outputs valid₀=1, u₀. u₀is equal to i₀(in other words, u₀is also equal to k₀). Similarly, when (isequal₁. index_11, index_12)=(1, i₁, k₁), it indicates that a row coordinate i₁in the row 1 in the first matrix is equal to a row coordinate k₁in the row 1 in the second matrix, and valid₁=1, u₁is output. u₁is equal to i₁(in other words, u₁is also equal to k₁). By analogy, the row coordinate comparison results are not described one by one. valid=1 indicates that the correspondingly output value u is valid. valid=0 indicates that the correspondingly output value u is invalid.

It should be noted that, in FIG. 16b, the row coordinate comparison circuit 4011 outputs eight groups of values valid and values u in total, and outputs four groups of comparison results isequal. A reason for which the row coordinate comparison circuit 4011 outputs eight groups of values valid is that a quantity of groups of signals output by the row coordinate comparison circuit 4011 and bit widths of the signals are fixed. The row coordinate comparison circuit 4011 receives four row coordinates in the first matrix and four row coordinates in the second matrix. If the four row coordinates in the first matrix fail to match the four row coordinates in the second matrix, a maximum quantity of valid is eight. Therefore output positions of eight valid may be reserved. A reason for which the row coordinate comparison circuit 4011 outputs four groups of comparison results isequal is that if all the four row coordinates in the first matrix match the four row coordinates in the second matrix one by one, a maximum of four groups of row coordinates all match one by one. Therefore, output positions of four comparison results isequal may be reserved. Using the example in FIG. 16a as an example, when i₀is equal to k₀, valid₀=1, and u₀is equal to i₀. When i₁is equal to k₁, valid₁=1, and u₁is equal to i₁. When i₂is equal to k₂, valid₂=1, and u₂is equal to i₂. When i₃is equal to k₃, valid₃=1, u₃is equal to i₃. In other words, all the four row coordinates in the first matrix match the four row coordinates in the second matrix one by one. Therefore, valid₄, valid₅, valid₆, and valid₇are all equal to 0, and indicate that u₃, u₄, u₅, and u₆are invalid values. u₃, u₄, u₅, and u₆each may be a preset value (for example, “−1”). The preset value indicates that u is invalid. A specific value of the preset value is not limited.

For example, the row coordinate comparison circuit receives the following two input arrays:

- {i₀, i₁, i₂, i₃}={1, 2, 3, 4}; and
- {k₀, k₁, k₂, k₃}={1, 2, 3, 4}.

The row coordinate comparison circuit outputs the following result:

- (isequal₀. index_01, index_02)=(1, i₀, k₀);
- (isequal₁. index_11, index_12)=(1, i₁, k₁);
- (isequal₂. index_21, index_22)=(1, i₂, k₂);
- (isequal₃. index_31, index_32)=(1, i₃, k₃);
- (valid₀, u₀)=(1, 1);
- (valid₁, u₁)=(1, 2);
- (valid₂, u₂)=(1, 3);
- (valid₃, u₃)=(1, 4);
- (valid₄, u₄)=(0, −1);
- (valid₅, u₅)=(0, −1);
- (valid₆, u₆)=(0, −1); and
- (valid₇, u₇)=(0, −1).

The column coordinate comparison circuit 4012 is configured to compare a column coordinate of each element value in the m^throw with a column coordinate of each element value in the f^throw based on the first row comparison result, to obtain a first column comparison result.

For example, the column coordinate comparison circuit 4012 may include a plurality of column coordinate comparison units (for example, a column coordinate comparison unit a, a column coordinate comparison unit b, and a column coordinate comparison unit c), and each column coordinate comparison unit is configured to compare column coordinates in two row vectors. The two row vectors may be understood as the vector {right arrow over (a)} and the vector {right arrow over (b)} in the foregoing embodiments. For example, the m^throw in the first matrix may be considered as a row vector, and the f^throw in the second matrix is also considered as a row vector. For example, if the column coordinate comparison unit a receives a first row comparison result a (for example, i₀is equal to k₀, and valid₀=1, u0) from the row coordinate comparison circuit 4011, the column coordinate comparison unit a compares a column coordinate of an element value in the row 0 in the first matrix with a column coordinate of an element value in the row 0 in the second matrix. For example, the column coordinate comparison unit a reads the column coordinates j₀, j₁, j₂, and j₃in the row 0 in the first matrix, and reads the column coordinates l₀, l₁, l₂, and l₃in the row 0 in the second matrix. The column coordinate comparison unit a compares the column coordinates (j₀, j₁, j₂, and j₃) with the column coordinates (l₀, l₁, l₂, and l₃), to obtain a column coordinate comparison result. A representation form of the column coordinate comparison result is similar to a representation form of the row coordinate comparison result.

For example, the column coordinate comparison circuit 4012 outputs the first column comparison result and/or a second column comparison result. The first column comparison result indicates that the two compared column coordinates are equal, and the second column comparison result indicates that no equal column coordinates are matched. The first column comparison result includes the column coordinate comparison result, a second signal (which is denoted as “valid”), and a second value (which is denoted as “v”). The second signal indicates validity of the value v, and the second value indicates a value of a column coordinate. Logic of performing column coordinate comparison by the column coordinate comparison circuit 4012 is similar to logic of performing row coordinate comparison by the row coordinate comparison circuit. Refer to the logic of the row coordinate comparison circuit for understanding. For example, refer to in FIG. 17a and FIG. 9 for understanding. The column coordinate comparison result is denoted as (isequal′₀. index_01, index_02); (isequal′₁. index_11, index_12); (isequal′₂. index_21, index_22); and (isequal′₃. index_31, index_32). The column coordinate comparison result indicates a pair of column position coordinates that are equal. For example, j₀and l₀are equal, and j₂and l₁are equal. valid′₃=1, v₃=j₃indicates that j₃is a valid value, and no equal column coordinates are found by the column coordinate comparison unit a for j₃. Similarly, valid′₄=1, v₄=l₂indicates that l₂is a valid value, and no equal column coordinates are found by the column coordinate comparison unit a for l₂. In other words, the column coordinate comparison unit a matches two pairs of equal column coordinates, but four column coordinates fail to match equal values. In this case, a quantity of valid values is six (valid′₀to valid′₅), and values of valid′₆and valid′₇are both 0, to indicate that v₆and v₇are invalid values (for example, “−1”).

For example, the column coordinate comparison circuit receives the following two input arrays:

- {j₀, j₁, j₂, j₃}={1, 2, 3, 4}; and
- {l₀, l₁, l₂, l₃}={1, 3, 5, 7}.

The column coordinate comparison circuit outputs the following result:

- (isequal′₀. index_01, index_02)=(1, j₀, l₀);
- (isequal′₁. index_11, index_12)=(1, j₂, l₁);
- (isequal′₂. index_21, index_22)=(0, j₁, l₂);
- (isequal′₃. index_31, index_32)=(0, j₃, l₃);
- (valid′₀, v₀)=(1, 1);
- (valid′₁, v₁)=(1, 2);
- (valid′₂, v₂)=(1, 3);
- (valid′₃, v₃)=(1, 4);
- (valid′₄, v₄)=(1, 5);
- (valid′₅, v₅)=(1, 7);
- (valid′₆, v₆)=(0, −1); and
- (valid′₇, v₇)=(0, −1).

A result of comparing column coordinates in two vectors whose row coordinates are equal (for example, the row 0 in the first matrix and the row 0 in the second matrix) by the column coordinate comparison unit a is used as an example for description. It should be understood that when the column coordinate comparison unit a performs column coordinate comparison, another column coordinate comparison unit (for example, the column coordinate comparison unit b or the column coordinate comparison unit c) also performs row coordinate comparison. For example, if the row coordinate of the row 1 in the first matrix is equal to the row coordinate of the row 1 in the second matrix, the column coordinate comparison unit b may compare column coordinates in the two row vectors: the row 1 in the first matrix and the row 1 in the second matrix. A process in which another column coordinate comparison unit performs row coordinate comparison is similar to the process in which the column coordinate comparison unit a performs row coordinate comparison, and details are not described herein again. Only that the column coordinate comparison unit a compares column coordinates in the row 0 in the first matrix and the row 0 in the second matrix is used as an example for description.

The accumulator 4021 is configured to add, based on the first column comparison result, an element value corresponding to the n^thcolumn coordinate of the m^throw and an element value corresponding to the l^thcolumn coordinate of the f^throw, to obtain the third element value. The first column comparison result indicates that the n^thcolumn coordinate of the m^throw is the same as the l^thcolumn coordinate of the f^throw. The element value corresponding to the n^thcolumn coordinate of the m^throw is the first element value, and the element value corresponding to the l^thcolumn coordinate of the f^throw is the second element value. n is less than or equal to N, and l is less than or equal to L.

For example, the following describes a structure of the accumulator by using an example. As shown in FIG. 10, the accumulator 4021 includes a plurality of adders (which may also be referred to as an “adder array”). A size of the adder array shown in FIG. 10 is 1×4. The adder array whose size is 1×4 can support addition of four arrays (for example, (a₀, b₀), (a₁, b₁), (a₂, b₂), and (a₃, b₃)), and add two values in each of the four arrays. The adder array outputs sums (for example, c₀, c₁, c₂, and c₃) of the four arrays. For example, a₀+b₀=c₀. Similarly, the adder array can be expanded to an adder array whose size is M×N. For example, the adder array is expanded to an adder array whose size is 4×4. A size of the adder array is designed based on an actual requirement.

The accumulator 4021 is further configured to output the third element value and third position coordinates of the third element value to the second cache 404. Optionally, the accumulator 4021 is further configured to output, to the second cache 404 based on the second column comparison result, fourth position coordinates that fail to be matched and a fourth element value corresponding to the fourth position coordinates. Refer to FIG. 19 and FIG. 14b for understanding. FIG. 19 is a schematic diagram of outputting, by the accumulator 4021, an element value to the second cache based on the first column comparison result and/or the second column comparison result. A size of the second cache 404 is P×Q, where P is equal to (M+K), and Q is equal to (N+L). In this embodiment, that dimensions of both the first matrix and the second matrix are M×N is used as an example. P is equal to 2M, and Q is equal to 2N. The accumulator 4021 receives two matrices in the compressed format and a coordinate comparison result from the position coordinate comparison circuit 401. For example, the accumulator 4021 inputs a₀₀and b₀₁into an adder a based on a second result a (for example, j₀and l₁are equal). The adder a adds a₀₀and b₀₁, to obtain a sum (for example, c₀₀). The accumulator 4021 outputs, to the second cache 404, the sum c₀₀and position coordinates (which are the same as position coordinates of a₀₀) corresponding to c₀₀. In other words, the accumulator 4021 inputs two element values with same position coordinates into an adder, the adder obtains a sum, and then the accumulator 4021 outputs the sum and position coordinates corresponding to the sum to the second cache 404. The position coordinates of c₀₀are the same as the position coordinates of a₀₀. For another example, a second column comparison result a indicates that no value equal to the column coordinate l₀in the second matrix is found in the first matrix. The accumulator 4021 directly outputs, based on the second column comparison result a, b₀₀and position coordinates (k₀, l₀) of b₀₀to the second cache 404. Refer to FIG. 14a and FIG. 17b for understanding again. The coordinate comparison result of the two rows: the row 0 in the first matrix and the row 0 in the second matrix is used as an example. The accumulator 4021 outputs c₀, c₁, c₂, c₃, c₄, and c₅, and position coordinates corresponding to the six element values to the second cache 404. The foregoing six element values and the corresponding position coordinates include: c₀and position coordinates (u₀, v₀), where c₀is equal to b₀₀, and (u₀, v₀) are equal to (k₀, l₀); c₁and position coordinates (u₀, v₁), where c₁is a sum of a₀₀and b₀₁, and (u₀, v₁) are equal to (i₀, j₀); c₂and position coordinates (u₀, v₂), where c₂is a sum of a₀₁and b₀₂, and (u₀, v₁) are equal to (i₀, j₁); and c₃and position coordinates (u₀, v₃), where c₃is a sum of a₀₂and b₀₃, and (u₀, v₃) are equal to (i₀, j₂). Optionally, if the foregoing six values are all non-zero element values, the accumulator 4021 outputs a valid signal (for example, valid″=1) for each non-zero element value. If there is a zero element value in the foregoing sums, the accumulator 4021 outputs an invalid signal (for example, valid″=0) for the zero element value. The second cache 404 stores a plurality of element values, position coordinates corresponding to each element value, and validity (valid″=1 or valid″=0) of each element value. Based on the validity of each element value, a final result matrix does not include a zero element value (where the zero element value is an invalid value).

Optionally, the row coordinate comparison result further includes a second row comparison result, and the second row comparison result indicates that no row coordinate equal to a row coordinate of a row w in the first matrix is found in the second matrix. In this case, the accumulator 4021 is further configured to directly output all element values in the row w in the first matrix and position coordinates of each element value to the second cache 404. The row w is any row in the first matrix. For example, refer to FIG. 14a for understanding. A row coordinate i₀in the first matrix is not equal to row coordinates k0, k1, k2, and k3 in the second matrix. In this case, the accumulator 4021 outputs, to the second cache 404, all element values in the row w, namely, an element value a₀₀and position coordinates (i₀, j₀) of a₀₀, an element value a₀₁and position coordinates (i₀, j₁) of a₀₁, an element value a₀₂and position coordinates (i₀, j₂) of a₀₂, and an element value a₀₃and position coordinates (i₀, j₃) of a₀₃.

It should be noted that precision of the element value and the position coordinates is not limited in this embodiment of this application, and the element value and the position coordinates may be of any precision. For example, the precision of the element value is double-precision FP64, and the precision of the position coordinates is precision int32.

Refer to FIG. 20. Optionally, to enable the computation apparatus to support computation of matrices in a plurality of compressed formats, and add an application scenario of the computation apparatus, the computation apparatus further includes a format conversion unit 405. The format conversion unit 405 is configured to convert a matrix in a non-target compressed format into a matrix in a target compressed format (the COO format). For example, at least one matrix in the first matrix and the second matrix is in the CSC format or the CSR format, the format conversion unit 405 converts a format of the first matrix and/or the second matrix into the COO format. A function of the format conversion unit 405 may be implemented by the central processing unit 1 in FIG. 3, or the function of the format conversion unit 405 may be implemented by a logic circuit in the computation apparatus.

Based on the computation apparatus provided in this application, great benefits can be obtained in a plurality of matrix computation scenarios. For example, when the computation apparatus is used in an artificial intelligence (artificial intelligence, AI) training and inference scenario, computation of a matrix in a compressed format and a matrix in an uncompressed format can be completely supported. The computation apparatus in this application can directly compute the matrix in the compressed format without performing a decompression operation on the matrix in the compressed format. In this way, computation efficiency can be improved by more than four times. In addition, for a scenario such as scientific computing, regardless of computation that is of a matrix in the uncompressed format and that requires high computing power, or a matrix computation scenario in which a memory bandwidth is limited, when the computation apparatus in this application is used, a matrix in the compressed format can be directly accessed from a memory, so that a computing benefit is improved.

The foregoing describes embodiments of the computation apparatus, and the following describes a method performed by the computation apparatus. Refer to FIG. 21. An embodiment of this application provides a computation method. The method may be performed by the computation device shown in FIG. 2. Optionally, the method is performed by the computation apparatus shown in FIG. 3. Optionally, the method may be performed by the computation apparatus shown in FIG. 4. Optionally, the method may be performed by the computation apparatus shown in FIG. 20. The computation apparatus is configured to perform the following step 2101 to step 2103.

Step 2101: A computation apparatus obtains a computation instruction, where the computation instruction includes a first vector and a second vector that are in a compressed format.

Step 2102: The computation apparatus compares position coordinates of an element value in the first vector with position coordinates of an element value in the second vector, to obtain a first coordinate comparison result, where the first vector includes a first element value and first position coordinates of the first element value, the second vector includes a second element value and second position coordinates of the second element value, the first coordinate comparison result includes a first comparison result, and the first comparison result indicates that the first position coordinates are the same as the second position coordinates.

Step 2103: The computation apparatus computes the first element value and the second element value based on the first comparison result, to obtain a computation value; and outputs a computation result of the first vector and the second vector to a cache, where the computation result is related to the computation value.

In an optional implementation, the computation instruction is an addition instruction. The computation apparatus adds the first element value and the second element value based on the first comparison result, to obtain a sum of the first element value and the second element value. The computation value is the sum. The computation result includes a third element value and third position coordinates of the third element value. The third element value is the sum. The third position coordinates are the same as the first position coordinates.

Optionally, when the third element value is a zero element value, the computation apparatus outputs an invalid signal for the zero element value. The invalid signal indicates that an element value in the computation result does not include the zero element value and position coordinates corresponding to the zero element value. The computation apparatus skips, based on the invalid signal, outputting the zero element value and the position coordinates corresponding to the zero element value to the cache.

In an optional implementation, the first coordinate comparison result further includes a second comparison result, and the first vector includes a fourth element value and fourth position coordinates of the fourth element value. The computation apparatus outputs the fourth element value and the fourth position coordinates to the cache based on the second comparison result. The computation result includes the fourth element value and the fourth position coordinates.

In an optional implementation, the computation instruction is a multiplication instruction. The computation apparatus multiplies the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result includes a fifth element value and fifth position coordinates of the fifth element value. The fifth element value is the product. The fifth position coordinates are the same as the first position coordinates.

In an optional implementation, the computation instruction is an inner product instruction. The computation apparatus multiplies the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result is an accumulated value of a plurality of products.

In an optional implementation, the computation instruction is a multiplication-addition instruction. The computation apparatus multiplies the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value, where the computation value is the product, the product is used as a fifth element value, and fifth position coordinates corresponding to the fifth element value are the same as the first position coordinates;

- compares sixth position coordinates with the fifth position coordinates, to obtain the third comparison result, where the third comparison result indicates that the sixth position coordinates are the same as the fifth position coordinates, the sixth position coordinates are position coordinates in a third vector, and the third vector includes a sixth element value and the sixth position coordinates corresponding to the sixth element value; and
- adds the sixth element value and the fifth element value based on the third comparison result, to obtain a sum of the sixth element value and the fifth element value, where the computation value includes the sum of the sixth element value and the fifth element value, the computation result includes a seventh element value and seventh position coordinates corresponding to the seventh element value, the seventh element value is the sum of the sixth element value and the fifth element value, and the seventh position coordinates are the same as the sixth position coordinates.

In an optional implementation, the computation instruction includes a first matrix and a second matrix that are in the compressed format, the first matrix includes the first vector, and the second matrix includes the second vector.

The computation apparatus is further configured to compare position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a second coordinate comparison result, where the second coordinate comparison result includes the first coordinate comparison result.

- comparing a row coordinate of an m^throw in the first matrix with a row coordinate of an f^throw in the second matrix, to obtain a row comparison result, where the row comparison result indicates that the row coordinate of the m^throw is the same as the row coordinate of the f^throw, m is less than or equal to M, and f is less than or equal to K;
- comparing a column coordinate of each element value in the m^throw with a column coordinate of each element value in the f^throw based on the row comparison result, to obtain a column comparison result, where the column comparison result indicates that an n^thcolumn coordinate of the m^throw is the same as an l^thcolumn coordinate of the f^throw, and the first comparison result includes the row comparison result and the column comparison result; and
- adding, based on the first comparison result, an element value corresponding to the n^thcolumn coordinate of the m^throw and an element value corresponding to the l^thcolumn coordinate of the f^throw, to obtain the third element value, where the element value corresponding to the n^thcolumn coordinate of the m^throw is the first element value, the element value corresponding to the l^thcolumn coordinate of the f^throw is the second element value, n is less than or equal to N, and l is less than or equal to L.

Refer to FIG. 22. This application further provides a computation method. The method may be performed by the computation device shown in FIG. 2. Optionally, the method is performed by the computation apparatus shown in FIG. 3. Optionally, the method may be performed by the computation apparatus shown in FIG. 4. Optionally, the method may be performed by the computation apparatus shown in FIG. 20. The method includes the following step 2201 to step 2204.

Step 2201: Obtain a computation instruction, where the computation instruction includes a first matrix and a second matrix that are in a compressed format.

For the first matrix and the second matrix in this step, refer to example descriptions of the first matrix and the second matrix in the foregoing apparatus embodiments. Details are not described herein again.

Step 2202: Compare position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a coordinate comparison result, where the first matrix includes a first element value and first position coordinates of the first element value, the second matrix includes a second element value and second position coordinates of the second element value, the coordinate comparison result includes a first comparison result, and the first comparison result indicates that the first position coordinates are the same as the second position coordinates.

For this step, refer to specific descriptions of the function performed by the position coordinate comparison circuit 401 in the foregoing computation apparatus embodiments. Details are not described herein again.

Optionally, a dimension of the first matrix is M×N, and a dimension of the second matrix is K×L. The comparing position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a coordinate comparison result includes:

- comparing a row coordinate of an m^throw in the first matrix with a row coordinate of an f^throw in the second matrix, to obtain a first row comparison result, where the first row comparison result indicates that the row coordinate of the m^throw is the same as the row coordinate of the f^throw, m is less than or equal to M, and f is less than or equal to K; and
- comparing a column coordinate of each element value in the m^throw with a column coordinate of each element value in the f^throw based on the first row comparison result, to obtain a first column comparison result, where the first column comparison result indicates that an n^thcolumn coordinate of the m^throw is the same as an l^thcolumn coordinate of the f^throw.

Step 2203: Add the first element value and the second element value based on the first comparison result, to obtain a result matrix, where the result matrix includes a third element value and third position coordinates of the third element value, the third element value is a sum of the first element value and the second element value, and the third position coordinates are the same as the first position coordinates.

For this step, refer to specific descriptions of the function performed by the accumulator 4021 in the foregoing computation apparatus embodiments. Details are not described herein again.

Optionally, an element value corresponding to the n^thcolumn coordinate of the m^throw and an element value corresponding to the l^thcolumn coordinate of the f^throw are added based on the first column comparison result, to obtain the third element value. The element value corresponding to the n^thcolumn coordinate of the m^throw is the first element value. The element value corresponding to the l^thcolumn coordinate of the f^throw is the second element value. n is less than or equal to N, and l is less than or equal to L.

In an optional implementation, when the third element value is a zero element value, the computation apparatus outputs an invalid signal for the zero element value. The invalid signal indicates that an element value in the result matrix does not include the zero element value.

In an optional implementation, the coordinate comparison result further includes a second comparison result. The first matrix includes a fourth element value and fourth position coordinates of the fourth element value. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second matrix. The matrix computation method may further include the following step 2204.

Step 2204: Output the fourth element value and the fourth position coordinates to a cache based on the second comparison result, where the cache is configured to cache the result matrix, the result matrix includes the fourth element value and the fourth position coordinates.

It should be noted that there is no limitation on a time sequence of step 2203 and step 2204, and step 2203 and step 2204 may be performed simultaneously.

In an embodiment of this application, a computation circuit is provided. The computation circuit is configured to perform one or more steps in step 2201 to step 2204 or one or more steps in step 2101 to step 2103 in the foregoing method embodiments. During actual application, the computation circuit may be an ASIC, an FPGA, a logic circuit, or the like.

In another embodiment of this application, a computation system or a chip is further provided. A structure of the system or the chip may be shown in FIG. 3, and includes a processor (for example, a central processing unit) 1 and a computation apparatus 1. The processor 1 is configured to send a computation instruction to the computation apparatus 1, and the computation apparatus 2 is configured to perform one or more steps in step 2201 to step 2204 or one or more steps in step 2101 to step 2103 in the foregoing method embodiments.

In still another embodiment of this application, a computation device is provided. A structure of the device may be shown in FIG. 2. The device may be specifically a PCIe card, a SoC, a processor, a server including the foregoing hardware, or the like. Refer to FIG. 2. The device includes the memory 201, the processor 202, the communication interface 203, and the bus 204. The communication interface 203 may include an input interface and an output interface.

The processor 202 may be configured to perform one or more steps in step 2201 to step 2204 or one or more steps in step 2101 to step 2103 in the foregoing method embodiments. In some feasible embodiments, the processor 202 may include a computation unit, and the computation unit may be configured to support the processor in performing one or more steps in the foregoing method embodiments. During actual application, the computation unit may be an ASIC, an FPGA, a logic circuit, or the like. Certainly, the computation unit may alternatively be implemented by using software. This is not specifically limited in this embodiment of this application.

It should be noted that components of the computation circuit, the computation system, the computation device, and the like provided in embodiments of this application are respectively configured to implement functions of corresponding steps in the foregoing method embodiments.

All or some of the foregoing embodiments may be implemented using software, hardware, firmware, or any combination thereof. When software is used to implement embodiments, the foregoing embodiments may be implemented completely or partially in a form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, all or some of the processes or the functions according to embodiments of this application are generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable apparatuses. The computer instructions may be stored in a computer-readable storage medium or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL)) or wireless (for example, infrared, radio, or microwave) manner. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium. The semiconductor medium may be a solid-state drive (solid-state drive, SSD).

In conclusion, the foregoing embodiments are merely intended for describing the technical solutions of this application, but not for limiting this application. Although this application is described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that they may still make modifications to the technical solutions described in the foregoing embodiments or make equivalent replacements to some technical features thereof, without departing from the scope of the technical solutions of embodiments of this application.

Number	Date	Country	Kind
202110961228.7	Aug 2021	CN	national
202111349874.4	Nov 2021	CN	national

	Number	Date	Country
Parent	PCT/CN2022/085509	Apr 2022	WO
Child	18440254		US

COMPUTATION APPARATUS, METHOD, SYSTEM, CIRCUIT, AND DEVICE, AND CHIP

Information

Publication Number

Date Filed

Date Published

Inventors

Original Assignees

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (2)

CROSS-REFERENCE TO RELATED APPLICATIONS

Continuations (1)