This application relates to the computer field, and in particular, to a computation apparatus, method, system, circuit, and device, and a chip.
Vector computation is an important computation type in different application scenarios such as artificial intelligence, scientific computing, and graph computing. An element value in a vector may include two types of values, namely, a zero element value and a non-zero element value. When there are a large quantity of zero element values in the vector, to save storage space, only the non-zero element value in the vector may be stored. In other words, the vector is compressed, and a vector in a compressed format is stored.
In a current technology, a common method for computing the vector in the compressed format is that the vector in the compressed format needs to be decompressed first, in other words, the vector in the compressed format needs to be converted into a vector in an uncompressed format, and then vector computation is performed on the vector in the uncompressed format. In a vector computation process, because a decompression operation needs to be performed on the vector in the compressed format, and decompressed data occupies very large memory space, a computation speed of the vector is limited by an access bandwidth of a memory. When the access bandwidth of the memory is fixed, the computation speed of the vector cannot be increased, resulting in low computation efficiency.
Embodiments of this application provide a computation apparatus, method, system, circuit, and device, and a chip, to directly compute a vector in a compressed format without decompressing the vector in the compressed format, so that efficiency of computing a vector in the compressed format can be improved.
According to a first aspect, an embodiment of this application provides a computation apparatus, including a position coordinate comparison circuit and a logical operation circuit. The position coordinate comparison circuit is configured to compare position coordinates of an element value in a first vector with position coordinates of an element value in a second vector, to obtain a first coordinate comparison result. Both the first vector and the second vector are vectors in a compressed format. The first vector includes a first element value and first position coordinates of the first element value. The second vector includes a second element value and second position coordinates of the second element value. The first coordinate comparison result includes a first comparison result. The first comparison result indicates that the first position coordinates are the same as the second position coordinates. The logical operation circuit is configured to compute the first element value and the second element value based on the first comparison result, to obtain a computation value; and output a computation result to a cache. The cache is configured to cache the computation result. The computation result is related to the computation value. In this embodiment, the computation apparatus can compute a vector in the compressed format. When computing two vectors in the compressed format, the computation apparatus compares position coordinates of element values in the two vectors, and adds two element values corresponding to same position coordinates in the two vectors, to obtain a computation result of computing the two vectors in the compressed format. In comparison with a conventional method in which the vector in the compressed format needs to be decompressed first, and then vector computation is performed on a decompressed vector, the computation apparatus provided in this embodiment of this application can effectively improve efficiency of computing the vector in the compressed format.
In an optional implementation, the logical operation circuit includes an accumulator. The position coordinate comparison circuit is further configured to receive an addition instruction, and transmit the first comparison result to the accumulator based on the addition instruction. The accumulator is configured to add the first element value and the second element value based on the first comparison result, to obtain a sum of the first element value and the second element value; and output the sum to the cache. The computation result includes a third element value and third position coordinates of the third element value. The third element value is the sum. The third position coordinates are the same as the first position coordinates. The computation result is a result vector obtained by adding the first vector and the second vector. In this embodiment, the computation apparatus may perform addition operation on the vector in the compressed format, and the computation apparatus may be used in an application scenario of vector addition computation.
In an optional implementation, the accumulator is further configured to: when the third element value is a zero element value, output an invalid signal for the zero element value. The invalid signal indicates that an element value in a computation result does not include the zero element value and position coordinates corresponding to the zero element value. In this embodiment, each third element value is obtained by adding two element values. In this case, the third element value may be the zero element value. The accumulator is further configured to: when the third element value is the zero element value, delete the zero element value and the position coordinates corresponding to the zero element value, so that the computation result output by the computation apparatus does not include the zero element value, to output a vector in the compressed format. Therefore, a transmission resource is saved, or a next computation operation is facilitated.
In an optional implementation, the accumulator skips, based on the invalid signal, outputting the zero element value and the position coordinates corresponding to the zero element value to the cache, so that the cache does not cache the zero element value and the position coordinates corresponding to the zero element value, to achieve an objective that the computation result does not include the zero element value.
In an optional implementation, the first coordinate comparison result further includes a second comparison result. The first vector includes a fourth element value and fourth position coordinates of the fourth element value. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second vector. The accumulator is further configured to output the fourth element value and the fourth position coordinates to the cache based on the second comparison result. The computation result includes the fourth element value and the fourth position coordinates. In this embodiment, the accumulator reserves position coordinates that fail to be matched in the two vectors and an element value corresponding to the position coordinates, and directly writes the position coordinates and the element value into the cache, to use the fourth element value as an element value in the computation result. Even if a plurality of position coordinates in the first vector do not completely match a plurality of position coordinates in the second vector one by one, the computation apparatus can still perform addition operation, and a vector computation scenario is added.
In an optional implementation, the logical operation circuit includes a multiplier. The position coordinate comparison circuit is further configured to receive a multiplication instruction, and transmit the first comparison result to the multiplier based on the multiplication instruction. The multiplier is configured to multiply the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result includes a fifth element value and fifth position coordinates of the fifth element value. The fifth element value is the product. The fifth position coordinates are the same as the first position coordinates. In this embodiment, the computation apparatus may perform multiplication operation on the vector in the compressed format, and the computation apparatus may be used in an application scenario of vector multiplication computation.
In an optional implementation, the logical operation circuit includes an inner product operation circuit. The position coordinate comparison circuit is further configured to receive an inner product instruction, and transmit the first comparison result to the inner product operation circuit based on the inner product instruction. The inner product operation circuit is configured to multiply the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result is an accumulated value of a plurality of products. Each product is a product of one first element value and one second element value. A computation result of an inner product of the first vector and the second vector is a scalar (a value). In this embodiment, the computation apparatus may perform inner product operation on the vector in the compressed format, and the computation apparatus may be used in an application scenario of vector inner product computation.
In an optional implementation, the logical operation circuit includes a multiplier and an accumulator, and the first coordinate comparison result further includes a third comparison result. The position coordinate comparison circuit is further configured to receive a multiplication-addition computation instruction, and transmit the first comparison result to the multiplier based on the multiplication-addition computation instruction. The multiplier is further configured to multiply the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value, so as to obtain a fifth element value and fifth position coordinates of the fifth element value. The fifth position coordinates are the same as the first position coordinates. The position coordinate comparison circuit is further configured to compare sixth position coordinates with the fifth position coordinates, to obtain the third comparison result; and transmit the third comparison result to the accumulator. The third comparison result indicates that the sixth position coordinates are the same as the fifth position coordinates. The sixth position coordinates are position coordinates in a third vector. The third vector includes a sixth element value and the sixth position coordinates corresponding to the sixth element value. The accumulator is configured to add the sixth element value and the fifth element value based on the third comparison result, to obtain a sum of the sixth element value and the fifth element value. The computation value includes the product of the first element value and the second element value and the sum of the sixth element value and the fifth element value. The computation result includes a seventh element value and seventh position coordinates corresponding to the seventh element value. The seventh element value is the sum of the sixth element value and the fifth element value. The seventh position coordinates are the same as the sixth position coordinates. In this embodiment, the computation apparatus may perform multiplication-addition operation on the vector in the compressed format, and the computation apparatus may be used in an application scenario of vector multiplication-addition computation.
In an optional implementation, the first coordinate comparison result further includes a fourth comparison result. The third vector includes an eighth element value and eighth position coordinates of the eighth element value. The fourth comparison result indicates that no position coordinates that are the same as the eighth position coordinates are found in the third vector. The accumulator is further configured to output the eighth element value and the eighth position coordinates to the cache based on the fourth comparison result. The computation result includes the eighth element value and the eighth position coordinates. In this embodiment, the accumulator reserves position coordinates that fail to be matched in the two vectors and an element value corresponding to the position coordinates, and directly writes the position coordinates and the element value into the cache, to use the eighth element value as an element value in the computation result. Even if a plurality of position coordinates in the third vector do not completely match a plurality of position coordinates in a result vector (to be specific, a result vector obtained after the first vector is multiplied by the second vector) one by one, the computation apparatus can still perform addition operation, and a vector computation scenario is added.
In an optional implementation, the position coordinate comparison circuit is further configured to compare position coordinates of an element value in a first matrix with position coordinates of an element value in a second matrix, to obtain a second coordinate comparison result. The first matrix includes the first vector. The second matrix includes the second vector. Both the first matrix and the second matrix are matrices in a compressed format. The second coordinate comparison result includes the first coordinate comparison result. In this embodiment, a matrix in the compressed format may be split into a plurality of vectors in the compressed format. In this case, one matrix in the compressed format may be considered as a plurality of vectors in the compressed format. Therefore, the foregoing plurality of types of operations for the vector in the compressed format may be extended to computation of the matrix in the compressed format. The computation apparatus may compute two matrices in the compressed format, so that an application scenario of the computation apparatus is added, and the efficiency of computing the vector in the compressed format is effectively improved.
In an optional implementation, the position coordinate comparison circuit includes a row coordinate comparison circuit and a column coordinate comparison circuit. A dimension of the first matrix is M×N, and a dimension of the second matrix is K×L. The row coordinate comparison circuit is configured to compare a row coordinate of an mth row in the first matrix with a row coordinate of an fth row in the second matrix, to obtain a row comparison result. The row comparison result indicates that the row coordinate of the mth row is the same as the row coordinate of the fth row. m is less than or equal to M, and f is less than or equal to K.
The column coordinate comparison circuit is configured to compare a column coordinate of each element value in the mth row with a column coordinate of each element value in the fth row based on the row comparison result, to obtain a column comparison result. The column comparison result indicates that an nth column coordinate of the mth row is the same as an lth column coordinate of the fth row. The first comparison result includes the row comparison result and the column comparison result. The accumulator is further configured to add, based on the first comparison result, an element value corresponding to the nth column coordinate of the mth row and an element value corresponding to the lth column coordinate of the fth row, to obtain the third element value. The element value corresponding to the nth column coordinate of the mth row is the first element value. The element value corresponding to the lth column coordinate of the fth row is the second element value. n is less than or equal to N, and l is less than or equal to L. In this embodiment, the position coordinate comparison circuit does not need to traverse and match all position coordinates in the first matrix with all position coordinates in the second matrix. A row coordinate in the first matrix and a row coordinate in the second matrix are first compared by using the row coordinate comparison circuit, and then column coordinates in position coordinates with a same row coordinate are compared, so that a quantity of times for which the position coordinates are compared is reduced, and a computing resource is saved.
In an optional implementation, the row comparison result includes a first signal and a first value. The column comparison result includes a second signal and a second value. The first signal indicates validity of the first value. The first value is equal to a value of the row coordinate of the mth row. The second signal indicates validity of the second value. The second value is equal to a value of the nth column coordinate.
According to a second aspect, an embodiment of this application provides a computation method, where the method is applied to a computation apparatus. The method includes:
In an optional implementation, the computation instruction is an addition instruction. The computing the first element value and the second element value based on the first comparison result, to obtain a computation value may include: adding the first element value and the second element value based on the first comparison result, to obtain a sum of the first element value and the second element value. The computation value is the sum. The computation result includes a third element value and third position coordinates of the third element value. The third element value is the sum. The third position coordinates are the same as the first position coordinates.
In an optional implementation, the method further includes: when the third element value is a zero element value, outputting an invalid signal for the zero element value. The invalid signal indicates that an element value in the computation result does not include the zero element value and position coordinates corresponding to the zero element value.
In an optional implementation, the method further includes: skipping, based on the invalid signal, outputting the zero element value and the position coordinates corresponding to the zero element value to the cache.
In an optional implementation, the first coordinate comparison result further includes a second comparison result. The first vector includes a fourth element value and fourth position coordinates of the fourth element value. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second vector. The method further includes: outputting the fourth element value and the fourth position coordinates to the cache based on the second comparison result. The computation result includes the fourth element value and the fourth position coordinates.
In an optional implementation, the computation instruction is a multiplication instruction. The computing the first element value and the second element value based on the first comparison result, to obtain a computation value may include: multiplying the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result includes a fifth element value and fifth position coordinates of the fifth element value. The fifth element value is the product. The fifth position coordinates are the same as the first position coordinates.
In an optional implementation, the computation instruction is an inner product instruction. The computing the first element value and the second element value based on the first comparison result, to obtain a computation value may include: multiplying the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result is an accumulated value of a plurality of products.
In an optional implementation, the computation instruction is a multiplication-addition instruction. The first coordinate comparison result further includes a third comparison result. The computing the first element value and the second element value based on the first comparison result, to obtain a computation value includes:
In an optional implementation, the computation instruction includes a first matrix and a second matrix that are in the compressed format. The first matrix includes the first vector. The second matrix includes the second vector. The comparing position coordinates of an element value in the first vector with position coordinates of an element value in the second vector, to obtain a first coordinates comparison result may include: comparing position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a second coordinate comparison result. The second coordinate comparison result includes the first coordinate comparison result.
In an optional implementation, a dimension of the first matrix is M×N, and a dimension of the second matrix is K×L. The comparing position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a second coordinate comparison result may include: comparing a row coordinate of an mth row in the first matrix with a row coordinate of an fth row in the second matrix, to obtain a row comparison result, where the row comparison result indicates that the row coordinate of the mth row is the same as the row coordinate of the fth row, m is less than or equal to M, and f is less than or equal to K; comparing a column coordinate of each element value in the mth row with a column coordinate of each element value in the fth row based on the row comparison result, to obtain a column comparison result, where the column comparison result indicates that an nth column coordinate of the mth row is the same as an lth column coordinate of the fth row, and the first comparison result includes the row comparison result and the column comparison result; and adding, based on the first comparison result, an element value corresponding to the nth column coordinate of the mth row and an element value corresponding to the lth column coordinate of the fth row, to obtain the third element value, where the element value corresponding to the nth column coordinate of the mth row is the first element value, the element value corresponding to the lth column coordinate of the fth row is the second element value, n is less than or equal to N, and l is less than or equal to L.
In an optional implementation, the row comparison result includes a first signal and a first value. The column comparison result includes a second signal and a second value. The first signal indicates validity of the first value. The first value is equal to a value of the row coordinate of the mth row. The second signal indicates validity of the second value. The second value is equal to the nth column coordinate.
According to a third aspect, an embodiment of this application provides a computation apparatus. The computation apparatus includes a position coordinate comparison circuit and an accumulator. The position coordinate comparison circuit is configured to compare position coordinates of an element value in a first matrix with position coordinates of an element value in a second matrix, to obtain a coordinate comparison result. Both the first matrix and the second matrix are matrices in a compressed format. The first matrix includes a first element value and first position coordinates of the first element value. The first element value is any element value in the first matrix. The second matrix includes a second element value and second position coordinates of the second element value. The second element value is any element value in the second matrix. The coordinate comparison result includes a first comparison result. The first comparison result indicates that the first position coordinates are the same as the second position coordinates. The accumulator is configured to add the first element value and the second element value based on the first comparison result, to obtain a sum of the first element value and the second element value; and output the sum to a cache. The cache is configured to cache a result matrix. The result matrix includes a third element value and third position coordinates of the third element value. The third element value is the sum of the first element value and the second element value. The third position coordinates are the same as the first position coordinates. In this embodiment, the computation apparatus can compute a matrix in the compressed format. When performing addition operation on two matrices in the compressed format, the computation apparatus compares position coordinates of element values in the two matrices, and adds two element values corresponding to same position coordinates in the two matrices, to obtain a result matrix of computing the two matrices in the compressed format. In comparison with a conventional method in which the matrix in the compressed format needs to be decompressed first, and then matrix computation is performed on a decompressed matrix, the computation apparatus provided in this embodiment of this application can effectively improve efficiency of computing the matrix in the compressed format.
In an optional implementation, the accumulator is further configured to: when the third element value is a zero element value, output an invalid signal for the zero element value. The invalid signal indicates that an element value in the result matrix does not include the zero element value and position coordinates corresponding to the zero element value. In this embodiment, each third element value is obtained by adding two element values. In this case, the third element value may be the zero element value. The accumulator is further configured to: when the third element value is the zero element value, delete the zero element value and the position coordinates corresponding to the zero element value, so that the result matrix output by the computation apparatus does not include the zero element value, to output a matrix in the compressed format. Therefore, a transmission resource is saved, or a next computation operation is facilitated.
In an optional implementation, the accumulator skips, based on the invalid signal, outputting the zero element value and the position coordinates corresponding to the zero element value to the cache.
In an optional implementation, the coordinate comparison result further includes a second comparison result. The first matrix includes a fourth element value and fourth position coordinates of the fourth element value. The fourth element value is any element value in the first matrix. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second matrix. The accumulator is further configured to output the fourth element value and the fourth position coordinates to the cache based on the second comparison result. The cache is configured to cache the result matrix. The result matrix includes the fourth element value and the fourth position coordinates. In this embodiment, the accumulator reserves position coordinates that fail to be matched in the two matrices and an element value corresponding to the position coordinates, and directly writes the position coordinates and the element value into the cache, to use the fourth element value as an element value in the result matrix. Even if a plurality of position coordinates in the first matrix do not completely match a plurality of position coordinates in the second matrix one by one, the computation apparatus can still perform addition operation, and a matrix computation scenario is added.
In an optional implementation, the position coordinate comparison circuit is further configured to learn, through reading, that both the first matrix and the second matrix include position coordinates, and perform, based on triggering of the position coordinates, an operation of comparing the position coordinates in the first matrix with the position coordinates in the second matrix. In this embodiment, the position coordinate comparison circuit receives the first matrix and the second matrix. Because both the first matrix and the second matrix include position coordinates, when the position coordinate comparison circuit learns, through reading, that both the first matrix and the second matrix include position coordinates, the operation of comparing, by the position coordinate comparison circuit, the position coordinates in the first matrix with the position coordinates in the second matrix is triggered.
In an optional implementation, the position coordinate comparison circuit includes a row coordinate comparison circuit and a column coordinate comparison circuit. A dimension of the first matrix is M×N, and a dimension of the second matrix is K×L. M×N may be the same as or different from K×L. The row coordinate comparison circuit is configured to compare a row coordinate of an mth row in the first matrix with a row coordinate of an fth row in the second matrix, to obtain a row comparison result. The row comparison result indicates that the row coordinate of the mth row is the same as the row coordinate of the fth row. m is less than or equal to M, and f is less than or equal to K. The column coordinate comparison circuit is configured to compare a column coordinate of each element value in the mth row with a column coordinate of each element value in the fth row based on the row comparison result, to obtain a column comparison result. The column comparison result indicates that an nth column coordinate of the mth row is the same as an lth column coordinate of the fth row. The first comparison result includes the row comparison result and the column comparison result. The accumulator is further configured to add, based on the column comparison result, an element value corresponding to the nth column coordinate of the mth row and an element value corresponding to the lth column coordinate of the fth row, to obtain the third element value. The element value corresponding to the nth column coordinate of the mth row is the first element value. The element value corresponding to the lth column coordinate of the fth row is the second element value. n is less than or equal to N, and l is less than or equal to L. In this embodiment, the position coordinate comparison circuit does not need to traverse and match all position coordinates in the first matrix with all position coordinates in the second matrix. A row coordinate in the first matrix and a row coordinate in the second matrix are first compared by using the row coordinate comparison circuit, and then column coordinates in position coordinates with a same row coordinate are compared, so that a quantity of times for which the position coordinates are compared is reduced, and a computing resource is saved.
In an optional implementation, the row comparison result includes a first signal and a first value. The column comparison result includes a second signal and a second value. The first signal indicates validity of the first value. The first value is equal to a value of the row coordinate of the mth row. The second signal indicates validity of the second value. The second value is equal to a value of the nth column coordinate.
According to a fourth aspect, an embodiment of this application provides a matrix computation method. The method is applied to a computation apparatus. The method includes: A computation apparatus first obtains a computation instruction. The computation instruction includes a first matrix and a second matrix that are in a compressed format. The computation apparatus then compares position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a coordinate comparison result. The first matrix includes a first element value and first position coordinates of the first element value. The second matrix includes a second element value and second position coordinates of the second element value. The coordinate comparison result includes a first comparison result. The first comparison result indicates that the first position coordinates are the same as the second position coordinates. Finally, the computation apparatus adds the first element value and the second element value based on the first comparison result, and outputs an obtained sum to a cache. The cache is configured to cache a result matrix. The result matrix includes a third element value and third position coordinates of the third element value. The third element value is the sum of the first element value and the second element value. The third position coordinates are the same as the first position coordinates.
In an optional implementation, the method further includes: When the third element value is a zero element value, the computation apparatus may output an invalid signal for the zero element value. The invalid signal indicates that an element value in the result matrix does not include the zero element value.
In an optional implementation, the method further includes: skipping, based on the invalid signal, outputting the zero element value and position coordinates corresponding to the zero element value to the cache, so that the result matrix does not include the zero element value. The result matrix is a matrix in the compressed format. Therefore, a transmission resource is saved, or a next computation operation is facilitated.
In an optional implementation, the coordinate comparison result further includes a second comparison result. The first matrix includes a fourth element value and fourth position coordinates of the fourth element value. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second matrix. The method may further include: The computation apparatus outputs the fourth element value and the fourth position coordinates to the cache based on the second comparison result. The result matrix includes the fourth element value and the fourth position coordinates. In this embodiment, the accumulator reserves position coordinates that fail to be matched in the two matrices and an element value corresponding to the position coordinates, and directly writes the position coordinates and the element value into the cache, to use the fourth element value as an element value in the result matrix. Even if a plurality of position coordinates in the first matrix do not completely match a plurality of position coordinates in the second matrix one by one, the computation apparatus can still perform addition operation, and a matrix computation scenario is added.
In an optional implementation, a dimension of the first matrix is M×N, and a dimension of the second matrix is K×L. That the computation apparatus compares position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a coordinate comparison result may specifically include: The computation apparatus first compares a row coordinate of an mth row in the first matrix with a row coordinate of an fth row in the second matrix, to obtain a row comparison result. The row comparison result indicates that the row coordinate of the mth row is the same as the row coordinate of the fth row. m is less than or equal to M, and f is less than or equal to K. The computation apparatus then compares a column coordinate of each element value in the mth row with a column coordinate of each element value in the fth row based on the row comparison result, to obtain a column comparison result. The column comparison result indicates that an nth column coordinate of the mth row is the same as an lth column coordinate of the fth row. The computation apparatus adds, based on the first comparison result, an element value corresponding to the nth column coordinate of the mth row and an element value corresponding to the lth column coordinate of the fth row, to obtain the third element value. The element value corresponding to the nth column coordinate of the mth row is the first element value. The element value corresponding to the lth column coordinate of the fth row is the second element value. n is less than or equal to N, and l is less than or equal to L.
In an optional implementation, the row comparison result includes a first signal and a first value. The column comparison result includes a second signal and a second value. The first signal indicates that the row coordinate of the mth row is the same as the row coordinate of the fth row in the second matrix. The first value is equal to the row coordinate of the mth row. The second signal indicates that the nth column coordinate of the mth row is the same as the lth column coordinate of the fth row. The second value is equal to the nth column coordinate.
According to a fifth aspect, a computation circuit is provided. The computation circuit is configured to perform an operation step of the computation method provided in any one of the second aspect or the possible implementations of the second aspect. Alternatively, the computation circuit is configured to perform an operation step of the computation method provided in any one of the fourth aspect or the possible implementations of the fourth aspect.
According to a sixth aspect, a computation system is provided. The system includes a processor and a computation apparatus. The processor is configured to send a computation instruction to the computation apparatus. The computation apparatus is configured to perform an operation step of the computation method provided in any one of the second aspect or the possible implementations of the second aspect. Alternatively, the computation apparatus is configured to perform an operation step of the computation method provided in any one of the fourth aspect or the possible implementations of the fourth aspect.
According to a seventh aspect, a chip is provided. The chip includes a processor. A computation apparatus is integrated into the processor. The computation apparatus is configured to perform an operation step of the computation method provided in any one of the second aspect or the possible implementations of the second aspect. Alternatively, the computation apparatus is configured to perform an operation step of the computation method provided in any one of the fourth aspect or the possible implementations of the fourth aspect.
According to an eighth aspect, a computation device is provided. The computation device includes the computation system in the sixth aspect or the chip in the seventh aspect.
According to a ninth aspect, a readable storage medium is provided. The readable storage medium stores instructions. When the readable storage medium runs on a device, the device is enabled to perform an operation step of the computation method provided in any one of the second aspect or the possible implementations of the second aspect, or the device is enabled to perform an operation step of the computation method provided in any one of the fourth aspect or the possible implementations of the fourth aspect.
According to a tenth aspect, a computer program product is provided. When the computer program product runs on a computer, the computer is enabled to perform an operation step of the computation method provided in any one of the second aspect or the possible implementations of the second aspect, or the computer is enabled to perform an operation step of the computation method provided in any one of the fourth aspect or the possible implementations of the fourth aspect.
It may be understood that any computation apparatus, computer storage medium, or computer program product provided above is configured to perform the corresponding method provided above. Therefore, for beneficial effects that can be achieved by the computation apparatus, computer storage medium, or computer program product, refer to beneficial effects in the corresponding method provided above. Details are not described herein again.
In this application, based on the implementations in the foregoing aspects, the implementations may be combined to provide more implementations.
b,
c, and
The following describes the technical solutions in embodiments of this application with reference to the accompanying drawings in embodiments of this application. In the specification, claims, and accompanying drawings of this application, the terms “first”, “second”, and the like are intended to distinguish between similar objects but do not necessarily indicate a specific order or sequence.
To better understand this application, related words in this application are first described by using examples.
Matrix (matrix): A matrix whose dimension is M×N is a rectangular array formed by arranging elements of M rows (rows) and N columns (columns). For example, a matrix A is shown in Equation (1), and a matrix B is shown in Equation (2).
Matrix addition and subtraction: Mutual addition and subtraction may be performed on matrices with a same dimension. Specifically, addition and subtraction are performed on elements at each position. For example, the matrix A and the matrix B each are a matrix whose dimension is m×n, and the matrix A and the matrix B are added to obtain a matrix. A matrix C is shown in Equation (3).
A row vector (row vector) is a matrix whose dimension is 1×M, where M is a positive integer. For example, a row vector is shown in Equation (5).
X=[x
1
x
2
. . . x
M] Equation (5)
A column vector (column vector) is a matrix whose dimension is M×1, where M is a positive integer. For example, a column vector is shown in Equation (6).
Matrix in a compressed format: When a matrix includes a zero element value and a non-zero element value, usually, to save storage space, the non-zero element value in the matrix may be stored in a format, and the zero element value is not stored. In this process, the matrix is compressed, and a matrix obtained after compression and storage is referred to as a matrix in the compressed format. A method for compressing the matrix includes but is not limited to coordinate representation (coordinate, COO), row compression (compressed sparse row, CSR), column compression (compressed sparse column, CSC), or the like.
The following separately describes the three compression methods, namely, the COO, the CSR, and the CSC by using examples.
COO: A matrix is indicated by using a triplet. The triplet includes three values, which are separately a row number, a column number, and an element value. The row number and the column number identify a position of the element value. For example, the triplet is (a row number, a column number, and an element value), or the triplet is (an element value, a row number, and a column number). Specifically, an arrangement sequence of the three values in the triplet is not limited. For example, refer to
Optionally, the matrix Y in the compressed format may be represented by using Equation (8).
Row coordinate=([0, 0, 1, 1, 2, 2, 2, 3, 3])
Column coordinate=([0, 1, 1, 2, 0, 2, 3, 1, 3])
Element value=([1, 2, 3, 4, 5, 6, 7, 8, 9]) Equation (8)
CSR: A matrix is represented by using three types of data, which are separately an element value, a column number, and a row offset. The element value and the column number in the CSR are represented in a manner similar to that of the element value and the column number in the foregoing COO manner. A difference between the CSR and the COO manner lies in that the row offset indicates a start offset position of the 1st element of a row in all element values. Refer to
The matrix Y in the compressed format may be represented by using Equation (9).
Row offset=([0, 2, 4, 7, 9])
Column coordinate=([0, 1, 1, 2, 0, 2, 3, 1, 3])
Element value=([1, 2, 3, 4, 5, 6, 7, 8, 9]) Equation (9)
CSC: A matrix is represented by using three types of data, which are separately an element value, a row number, and a column offset. The element value and the row number in the CSR are represented in a manner similar to that of the element value and the row number in the foregoing COO manner. A difference between the CSC and the COO manner lies in that the column offset indicates a start offset position of the 1st element of a column in all element values. Refer to
The matrix Y in the compressed format may be represented by using Equation (10).
Column offset=([0, 2, 5, 7, 9])
Row coordinate=([0, 0, 1, 1, 2, 2, 2, 3, 3])
Element value=([1, 5, 2, 3, 8, 4, 6, 7, 9]) Equation (10)
It can be learned from descriptions of the foregoing three matrix compression methods that, each element value in a matrix in a COO compression format has a corresponding row coordinate (row number) and column coordinate (column number). Each element value in a matrix in a CSR compression format has a corresponding column coordinate. Each element value in a matrix in a CSC compression format has a corresponding row coordinate.
Matrix in an uncompressed format: Refer to the matrix Y shown in
Vector in a compressed format: The vector in the compressed format in embodiments may be a vector obtained by compressing a vector in an uncompressed format. For example, a vector {right arrow over (g)}1 in the uncompressed format is [1, 2, 0, 3, 0, 5]. The vector {right arrow over (g)}1 in the uncompressed format includes six element values. If {right arrow over (g)}1 is compressed, only non-zero element values in the vector are reserved, and a position of each non-zero element value needs to be marked by using position coordinates. In an example, the vector {right arrow over (g)}2 in the compressed format is represented by using Equation (11).
Element value=([1, 2, 3, 5])
Position coordinates=([0, 1, 4, 6]) Equation (11)
In Equation (11), the element value “1” is at a position “0”. Therefore, position coordinates corresponding to the element value “1” is “0”. Similarly, the element value “2” is at a position “1”, and position coordinates corresponding to the element value “2” is “1”. Details are not described one by one by using examples.
For another example, refer to
Element value=([1, 2, 5, 7])
Position coordinates=([1, 3, 4, 5]) Equation (12)
The vector in the compressed format is also referred to as a sparse vector, and the vector in the uncompressed format is also referred to as a dense vector.
The memory 201 may be configured to store data, a software program, and a module, and mainly includes a program storage area and a data storage area. The program storage area may store an operating system, a software application required for at least one function, intermediate software, and the like. The data storage area may store data created during use of the device, and the like. For example, the operating system may include a Linux operating system, a Unix operating system, or a Window operating system. The software application required for the at least one function may include an application related to artificial intelligence (artificial intelligence), an application related to high-performance computing (high-performance computing, HPC), an application related to deep learning (deep learning), an application related to scientific computing, or the like. The intermediate software may include a basic linear algebra subprogram BLAS, or the like. In a possible example, the memory 201 includes but is not limited to a static random access memory (static RAM, SRAM), a dynamic random access memory (dynamic RAM, DRAM), a synchronous dynamic random access memory (synchronous DRAM, SDRAM), or a high-speed random access memory. Further, the memory 201 may further include another non-volatile memory, for example, at least one magnetic disk storage device, a flash memory device, or another volatile solid-state storage device.
In addition, the processor 202 is configured to control and manage an operation of the computation device, for example, perform various functions of the computation device and process data by running or executing the software program and/or the module stored in the memory 201 and by invoking the data stored in the memory 201. In a possible example, the processor 202 includes but is not limited to a central processing unit (central processing unit, CPU), a network processing unit (network processing unit, NPU), a graphics processing unit (graphics processing unit, GPU), an application-specific integrated circuit (application-specific integrated circuit, ASIC), a field programmable gate array (field programmable gate array, FPGA) or another programmable logic device, a transistor logic device, a logic circuit, or any combination thereof. The processor may implement or execute various example logical blocks, modules, and circuits described with reference to content disclosed in this application. Alternatively, the processor 202 may be a combination of processors implementing a computing function, for example, a combination of one or more microprocessors, or a combination of a digital signal processor and a microprocessor.
The communication interface 203 is configured to implement communication between the computation device and an external device. The communication interface 203 may include an input interface and an output interface. The input interface may be configured to obtain a first vector (or a first matrix) and a second matrix vector (or a second matrix) that are in a compressed format and that are in the following embodiments. In some feasible embodiments, the input interface may include only one input interface, or may include a plurality of input interfaces. The output interface may be configured to output a computation result in the following embodiments. In some feasible embodiments, the computation result may be directly output by the processor, or may be first stored in the memory, and then output by the memory. In some other feasible embodiments, there may be only one output interface, or there may be a plurality of output interfaces.
The bus 204 may be a peripheral component interconnect express (Peripheral Component Interconnect Express, PCIe) bus, an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The bus 204 may be classified into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used to represent the bus in
In this embodiment, the processor 202 may include a computation apparatus. The computation apparatus may be an ASIC, an FPGA, a logic circuit, or the like. Certainly, the apparatus may alternatively be implemented by using software. This is not specifically limited in this embodiment of this application. The computation apparatus may be for vector and matrix computation related to artificial intelligence, scientific computing, graphic computing, or the like.
Further, the processor 202 may include one or more of other processing units such as a CPU, a GPU, or an NPU. As shown in
In an embodiment of this application, a computation apparatus can compute a vector in a compressed format. When computing two vectors in the compressed format, the computation apparatus compares position coordinates of element values in the two vectors, and computes two element values corresponding to same position coordinates in the two vectors, to obtain a computation result of computing the two vectors in the compressed format. In comparison with a conventional method in which the vector in the compressed format needs to be decompressed first, and then vector computation is performed on a decompressed vector, the computation apparatus provided in this embodiment of this application can effectively improve efficiency of computing the vector in the compressed format.
An embodiment of this application provides a computation apparatus. Refer to
The position coordinate comparison circuit 401 is configured to receive the first vector and the second vector from the first cache 403, and compare position coordinates of an element value in the first vector with position coordinates of an element value in the second vector, to obtain a first coordinate comparison result. Both the first vector and the second vector are vectors in a compressed format. The first vector includes a first element value and first position coordinates of the first element value. The second vector includes a second element value and second position coordinates of the second element value. The first coordinate comparison result includes a first comparison result. The first comparison result indicates that the first position coordinates are the same as the second position coordinates.
For example, the first vector is denoted as “{right arrow over (a)}”, the second vector is denoted as “{right arrow over (b)}”, and a length of the first vector and a length of the second vector are both T. It should be noted that the length of {right arrow over (a)} and the length of {right arrow over (b)} are not limited in this embodiment of this application. The length of {right arrow over (a)} and the length of {right arrow over (b)} may be the same, or may be different. In this embodiment, that the length of {right arrow over (a)} and the length of {right arrow over (b)} are both T is merely an example for ease of description. In addition, both {right arrow over (a)} and {right arrow over (b)} may be row vectors, or may be column vectors. This is not specifically limited. {right arrow over (a)} includes the first element value and the first position coordinates corresponding to the first element value. For example, a plurality of first element values in {right arrow over (a)} and first position coordinates corresponding to each first element value are shown in Table 1 below. A plurality of second element values in {right arrow over (b)} and second position coordinates corresponding to each second element value are shown in Table 2 below.
The position coordinate comparison circuit 401 compares the first position coordinates in Table 1 and the second position coordinates in Table 2. If y pairs of position coordinates in the T first position coordinates and the T second position coordinates match, y is an integer greater than or equal to 0 and less than or equal to T.
It may be understood that, in
Example 1: The position coordinate comparison circuit receives the following two input arrays:
The position coordinate comparison circuit outputs the following result:
The position coordinate comparison result may indicate a pair of position coordinates that are equal. For example, h0 and p0 are equal, and h1 and p1 are equal. The isequal″ comparison result is a final comparison result, and a comparison process is not focused. For example, valid″3=1, t2=h2 indicates that h2 is a valid value. In addition, with reference to the comparison result (isequal″2. index_21, index_22)=(0, h2, p2), no position coordinates equal to h2 are found by the position coordinate comparison circuit 401 (and no position coordinates equal to p2 are found). It can be learned from Example 1 that, two pairs of equal position coordinates are found by the position coordinate comparison circuit 401, but no equal values are found for four position coordinates. In this case, a quantity of valid values is 6 (valid″0 to valid″5). Values of valid″6 and valid″7 are both 0, to indicate that t6 and t7 are both invalid values (where for example, the invalid value may be represented by “−1”).
The logical operation circuit 402 is configured to compute the first element value and the second element value based on the first comparison result, to obtain a computation value; and output a computation result to the second cache 404. The computation result is related to the computation value.
In this embodiment of this application, the foregoing operation includes but is not limited to addition operation, multiplication operation, inner product operation, multiplication-addition operation, and the like. Optionally, refer to
Example one: The addition operation ({right arrow over (a)}+{right arrow over (b)}) is described by using an example.
In an optional embodiment, the position coordinate comparison circuit 401 is further configured to receive an addition instruction, where the addition instruction includes the first vector and the second vector; and transmit the first comparison result to the accumulator 4021 based on the addition instruction.
The accumulator 4021 is configured to receive the first vector, the second vector, and the first comparison result. The first comparison result indicates that the first position coordinates are the same as the second position coordinates. For example, a first comparison result a includes (isequal″0. index_01, index_02)=(1, h0, p0) and (valid″0, t0)=(1, 1) in Example 1. The accumulator 4021 adds the first element value and the second element value based on the first comparison result, to obtain a sum of the first element value and the second element value, and outputs the sum to the second cache 404. The second cache 404 is configured to cache the computation result (a result vector obtained by adding the first vector and the second vector). The computation result includes a third element value and third position coordinates of the third element value. The third element value is the sum. The third position coordinates are the same as the first position coordinates.
For example, refer to Table 1, Table 2, and
In Table 3, c0=a0+b0, c1=a1+b1, c2=a2+b2, . . . , and cT−1=aT−1+bT−1. q0=h0 (q0=p0), q1=h1 (q1=p1), q2=h2 (q2=p2), . . . , qT−1=hT−1 (qT−1=pT−1).
In
Optionally, if not all the plurality of first position coordinates in the first vector match the plurality of second position coordinates in the second vector one by one, but one part of the position coordinates in {right arrow over (a)} can match equal position coordinates in {right arrow over (b)}, and the other part of the position coordinates in {right arrow over (a)} cannot match equal position coordinates in {right arrow over (b)}, the first vector includes a fourth element value and fourth position coordinates of the fourth element value. The first coordinate comparison result further includes a second comparison result. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second vector. For example, a second comparison result a includes (isequal″2. index_21, index_22)=(0, h2, p2), (valid″2, t2)=(1, 3), and the like in Example 1.
The accumulator 4021 is further configured to output the fourth element value and the fourth position coordinates to the second cache 404 based on the second comparison result. The computation result includes the fourth element value and the fourth position coordinates. In other words, the second cache stores the third element value and the corresponding position coordinates, and the fourth element value and the corresponding position coordinates. The third element value is the sum of the first element value and the second element value, and the fourth element value is equal to the first element value or the second element value. In this embodiment, the accumulator 4021 reserves position coordinates that fail to be matched in the two vectors and an element value corresponding to the position coordinates, and directly writes the position coordinates and the element value into the second cache 404, so that each fourth element value is also used as an element value in the computation result. Even if the plurality of position coordinates in the first vector do not completely match the plurality of position coordinates in the second vector one by one, the computation apparatus can still perform addition operation, and a vector computation scenario is added.
Refer to
Optionally, because each third element value is obtained by adding two element values, the third element value may be a zero element value. To save a transmission resource or facilitate a next computation operation, the computation apparatus may compress the foregoing result vector, to output a vector in the compressed format. For example, the accumulator 4021 is further configured to: when the third element value is the zero element value, output an invalid signal. The invalid signal indicates that the accumulator 4021 does not output the zero element value and position coordinates corresponding to the zero element value to the second cache 404, so that the computation result output by the computation apparatus does not include the zero element value. For example, (isequal″0. index_01, index_02)=(1, h0, p0), and (valid″0, t0)=(1, 1). h0=p0.
In addition, a first element value corresponding to h0 is “1”, and a first element value corresponding to p0 is “−1”. In this case, the third element value is “0” (1+(−1)), and the accumulator 4021 set a value of valid″0 from “1” to “0” (invalid signal). The accumulator 4021 skips, based on the invalid signal (valid″0=0), outputting the zero element value and the position coordinates corresponding to the zero element value to the second cache 404, so that the second cache 404 does not cache the zero element value.
Example two: The multiplication operation ({right arrow over (a)}×{right arrow over (b)}) is described by using an example.
In an optional embodiment, the position coordinate comparison circuit 401 is further configured to receive a multiplication instruction, where the multiplication instruction includes the first vector and the second vector; and transmit the first comparison result to the multiplier 4022 based on the multiplication instruction.
The multiplier 4022 is configured to receive the first vector, the second vector, and the first comparison result; multiply the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value; and output the product (the computation value) to the second cache 404. The computation result includes a fifth element value (for example, denoted as “d”) and fifth position coordinates (for example, denoted as “s”) of the fifth element value. The fifth element value is the product. The fifth position coordinates are the same as the first position coordinates.
With reference to Table 1 and Table 2 for understanding, if h0=p0, h1=p1, and h3=p2, a fifth element value d0=a0×b0, and a fifth coordinate value corresponding to d0 is s0 (s0=h0); a fifth element value d1=a1×b1, and a fifth coordinate value corresponding to d1 is s1 (s1=h1); and a fifth element value d2=a3×b2, and a fifth coordinate value corresponding to d2 is s2 (s2=h3). The multiplier 4022 does not compute an element value that is in the first vector and the second vector and that corresponds to position coordinates that fail to be matched.
Refer to
Example three: The inner product operation ({right arrow over (a)}·{right arrow over (b)}) is described by using an example.
In an optional embodiment, the position coordinate comparison circuit 401 is further configured to receive an inner product instruction, where the inner product instruction includes the first vector and the second vector; and transmit the first comparison result to the inner product operation circuit 4023 based on the inner product instruction.
The inner product operation circuit 4023 is configured to receive the first vector, the second vector, and the first comparison result; multiply the first element value by the second element value based on the first comparison result, to obtain a product (the computation value); and output the computation result to the second cache 404. The computation result is an accumulated value of a plurality of products. Each product is a product of a pair of a first element value and a second element value that have same position coordinates. With reference to Table 1 and Table 2 for understanding, if there are x pairs of matching position coordinates in total between the vector {right arrow over (a)} and the vector {right arrow over (b)}, for example, h0=p0, h1=p1, and h3=p2, the computation result of {right arrow over (a)}·{right arrow over (b)} is shown in Equation (13).
{right arrow over (a)}·{right arrow over (b)}=a
0
×b
0
+a
1
×b
1
+a
3
×b
2 Equation (13)
The inner product operation result of the vector {right arrow over (a)} and the vector {right arrow over (b)} is a scalar. For example, refer to
Example four: The multiplication-addition operation ({right arrow over (a)}×{right arrow over (b)}+{right arrow over (z)}) is described by using an example.
In this embodiment, three vectors are involved, and the three vectors are all vectors in the compressed format, for example, the first vector {right arrow over (a)}, the second vector {right arrow over (b)}, and a third vector {right arrow over (z)}. The multiplier 4022 is configured to perform the multiplication operation on {right arrow over (a)} and {right arrow over (b)}, and the accumulator 4021 is configured to perform the addition operation on a result of {right arrow over (a)}×{right arrow over (b)} and {right arrow over (z)}.
The position coordinate comparison circuit 401 is further configured to receive a multiplication-addition computation instruction, where the multiplication-addition instruction includes the first vector and the second vector; and transmit the first comparison result to the multiplier 4022 based on the multiplication-addition computation instruction.
The multiplier 4022 is further configured to receive the first vector, the second vector, and the first comparison result; and multiply the first element value by the second element value based on the first comparison result, to obtain a product (the computation value) of the first element value and the second element value. The product is a fifth element value, and fifth position coordinates corresponding to the fifth element value are the same as the first position coordinates. The multiplier 4022 transmits the fifth computation value and the fifth position coordinates to the second cache 404.
For the multiplication operation on the vector {right arrow over (a)} and the vector {right arrow over (b)}, refer to descriptions of the multiplication operation ({right arrow over (a)}×{right arrow over (b)}) in Example two. Details are not described herein again.
The second cache 404 outputs the result (for example, denoted as a vector “{right arrow over (y)}”) of {right arrow over (a)}×{right arrow over (b)} to the first cache 403.
The position coordinate comparison circuit 401 is further configured to receive the vector {right arrow over (y)} and the vector {right arrow over (z)} from the first cache 403; compare sixth position coordinates in the vector {right arrow over (z)} with the fifth position coordinates in the vector {right arrow over (y)}, to obtain a third comparison result; and transmit the third comparison result to the accumulator 4021. The third comparison result indicates that the sixth position coordinates are the same as the fifth position coordinates.
The accumulator 4021 is configured to add a sixth element value and the fifth element value based on the third comparison result, to obtain a sum of the sixth element value and the fifth element value. The computation result includes a seventh element value and seventh position coordinates corresponding to the seventh element value. The seventh element value is the sum of the sixth element value and the fifth element value. The seventh position coordinates are the same as the sixth position coordinates.
Descriptions of the addition operation on the vector {right arrow over (y)} and the vector {right arrow over (z)} and descriptions of the addition operation on the vectors in Example one are not described herein again.
Refer to
The position coordinate comparison circuit 401 compares position coordinates in the vector {right arrow over (y)} with position coordinates in the vector {right arrow over (z)}, and there are two pairs of position coordinates that are the same in the vector {right arrow over (y)} and the vector {right arrow over (z)} in total. For example, the fifth position coordinates “1” corresponding to the fifth element value “20” are the same as sixth position coordinates “1” corresponding to a sixth element value “7”, and the fifth position coordinates “5” corresponding to the fifth element value “60” are the same as sixth position coordinates “5” corresponding to a sixth element value “2”. In this case, the accumulator 4021 outputs, to the second cache 404, a sum “27” of the fifth element value “20” and the sixth element value “7” and position coordinates “1” corresponding to the sum; outputs, to the second cache 404, a sum “62” of the fifth element value “60” and the sixth element value “2” and position coordinates “5” corresponding to the sum; and outputs, to the second cache 404, the fifth element value “21” and the corresponding position coordinates “2”, and sixth element values “11” and “14” and corresponding sixth position coordinates “3” and “4”, where the position coordinates of the fifth element value “21” and the sixth element values “11” and “14” fail to be matched. The computation result {right arrow over (a)}×{right arrow over (b)}+{right arrow over (z)} finally cached in the second cache 404 is shown in Equation (14).
Seventh element value=([27, 21, 11, 14, 62])
Seventh position coordinates=([1, 2, 3, 4, 5]) Equation (14)
In this embodiment, the computation apparatus may perform the plurality of types of operations on the vector in the compressed format. For example, the plurality of types of operations include the addition operation, the multiplication operation, the inner product operation, and the multiplication-addition operation. When computing two vectors in the compressed format, the computation apparatus compares position coordinates of element values in the two vectors, and performs related operation on two element values corresponding to same position coordinates in the two vectors, to obtain a result vector of computing the two vectors in the compressed format. In comparison with a conventional method in which the vector in the compressed format needs to be decompressed first, and then vector computation is performed on a decompressed vector, the computation apparatus provided in this embodiment of this application can effectively improve efficiency of computing the vector in the compressed format.
Optionally, refer to
In the foregoing embodiments, the plurality of types of operations on a vector in the compressed format are described by using examples. A matrix in the compressed format may be split into a plurality of vectors in the compressed format. In this case, one matrix in the compressed format may be considered as a plurality of vectors in the compressed format. Therefore, the foregoing plurality of types of operations for the vector in the compressed format may be extended to computation of the matrix in the compressed format. In a specific embodiment, the computation apparatus in this application may alternatively be used in matrix computation. The following uses addition as an example to describe addition on a compressed matrix.
The position coordinate comparison circuit 401 is configured to compare position coordinates of an element value in a first matrix with position coordinates of an element value in a second matrix, to obtain a coordinate comparison result (which is also referred to as a “second coordinate comparison result” in this embodiment). Both the first matrix and the second matrix are matrices in a compressed format.
As described in the foregoing example of the matrix in the compressed format, each element value (value) in the matrix in the compressed format has corresponding position coordinates.
For example, an example in which a dimension of the first matrix is M×N, a dimension of the second matrix is K×L, and compressed formats of the first matrix and the second matrix are COO is used for description. The dimension of the first matrix may be the same as or different from the dimension of the second matrix. This is not specifically limited. For ease of description, in this embodiment, an example in which M, N, K, and L are all 4 is used for description, that is, the dimensions of the first matrix and the second matrix are both 4×4. The first matrix includes a first element value and first position coordinates of the first element value, where the first element value is any element value in the first matrix. The second matrix includes a second element value and position coordinates of the second element value, where the second element value is any element value in the second matrix. For example, each of the first matrix and the second matrix includes four rows, each row includes four element values, and each matrix includes 16 element values. The first matrix is used as an example. Four element values in a row 0 are sequentially a00, a01, a02, and a03. Position coordinates of a00 is (i0, j0), position coordinates of a01 is (i0, j1), position coordinates of a02 is (i0, j2), and position coordinates of a03 is (i0, j3). Another element value in the first matrix and position coordinates corresponding to each element value are not described in detail one by one by using examples. The second matrix is used as an example. Four element values in a row 0 are sequentially b00, b01, b02, and b03. Position coordinates of b00 is (k0, l0), position coordinates of b01 is (k0, l1), position coordinates of b02 is (k0, l2), and position coordinates of b03 is (k0, l3). Another element value in the second matrix and position coordinates corresponding to each element value are not described in detail one by one by using examples.
For example, refer to
In a first case, the coordinate comparison result includes a first comparison result, and the first comparison result indicates that first position coordinates are the same as second position coordinates. For example, if the position coordinates (i0, j1) in the first matrix are (0, 2), and the position coordinates (k0, l0) in the second matrix are (0, 2). In this case, the position coordinates (i0, j1) are equal to the position coordinates (k0, l0). For another example, position coordinates (i1, j0) in the first matrix are (1, 3), and position coordinates (k1, l2) in the first matrix are also (1, 3). In this case, the position coordinates (i1, j0) are equal to the position coordinates (k1, l2).
The accumulator 4021 is configured to receive the first matrix and the second matrix, and add the first element value and the second element value based on the first comparison result, to obtain a result matrix. The result matrix includes a third element value and third position coordinates of the third element value. The third element value is a sum of the first element value and the second element value. The third position coordinates are the same as the first position coordinates (or the second position coordinates). The accumulator 4021 writes the third element value and the third position coordinates into the second cache 404. The second cache 404 is configured to store the result matrix. For example, refer to
Optionally, because each third element value is obtained by adding two element values, the third element value may be a zero element value. To save a transmission resource or facilitate a next computation operation, the computation apparatus may compress the result matrix, to output a matrix in the compressed format. For example, the accumulator 4021 is further configured to: when the third element value is the zero element value, output an invalid signal. The invalid signal indicates that the zero element value and position coordinates corresponding to the zero element value are not output to the second cache 404, so that the result matrix output by the computation apparatus does not include the zero element value. Refer to
In a second case, the coordinate comparison result further includes a second comparison result, and the first matrix includes a fourth element value and fourth position coordinates of the fourth element value. The fourth element value is any element value in the first matrix. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second vector.
The accumulator 4021 is further configured to write the fourth element value and the fourth position coordinates into the second cache 404 based on the second comparison result. The second cache 404 is configured to cache the result matrix. The result matrix includes the fourth element value and the fourth position coordinates corresponding to the fourth element value. It should be understood that, in this case, element values having same position coordinates in the two matrices are added to obtain a sum (the third element value) of the two element values, and the accumulator 4021 writes the position coordinates and the sum into the second cache 404. The accumulator 4021 reserves position coordinates that fail to be matched in the two matrices and an element value corresponding to the position coordinates, and directly writes the element value (the fourth element value) into the second cache 404, so that the fourth element value is used as an element value in a result matrix C3. For example, refer to
The following describes a structure and a function of the position coordinate comparison circuit by using an example. Refer to
It should be understood that, in this embodiment, whether the position coordinate comparison circuit 401 first compares row coordinates and then compares column coordinates or first compares column coordinates and then compares row coordinates is not limited. For example, in a manner A, the first matrix and the second matrix compress element values corresponding to a same row coordinate into a same row. Each matrix in the compressed format may be considered to include a plurality of row vectors. In this case, the position coordinate comparison circuit 401 first compares row coordinates of two row vectors in the two matrices by using the row coordinate comparison circuit 4011. If the row coordinates of the two row vectors are the same, the column coordinate comparison circuit 4012 continues to compare column coordinates of all the element values in the two row vectors. For example, in a manner B, the first matrix and the second matrix compress element values corresponding to a same column coordinate into a same column. Each matrix in the compressed format may be considered to include a plurality of column vectors. In this case, the position coordinate comparison circuit 401 first compares column coordinates of two column vectors in the two matrices by using the column coordinate comparison circuit 4012. If the column coordinates of the two column vectors are the same, the row coordinate comparison circuit 4011 continues to compare row coordinate of all the element value in the two column vectors. In this embodiment, the foregoing manner A is used as an example for description.
For example, the row coordinate comparison circuit 4011 is configured to compare a row coordinate of an mth row in the first matrix with a row coordinate of an fth row in the second matrix, to obtain a first row comparison result. The first row comparison result indicates that the row coordinate of the mth row is the same as the row coordinate of the fth row. m is an integer less than or equal to M, and f is an integer less than or equal to K. The mth row is any row in the first matrix, and the fth row is any row in the second matrix.
It should be noted that, in
For example, the row coordinate comparison circuit receives the following two input arrays:
The row coordinate comparison circuit outputs the following result:
The column coordinate comparison circuit 4012 is configured to compare a column coordinate of each element value in the mth row with a column coordinate of each element value in the fth row based on the first row comparison result, to obtain a first column comparison result.
For example, the column coordinate comparison circuit 4012 may include a plurality of column coordinate comparison units (for example, a column coordinate comparison unit a, a column coordinate comparison unit b, and a column coordinate comparison unit c), and each column coordinate comparison unit is configured to compare column coordinates in two row vectors. The two row vectors may be understood as the vector {right arrow over (a)} and the vector {right arrow over (b)} in the foregoing embodiments. For example, the mth row in the first matrix may be considered as a row vector, and the fth row in the second matrix is also considered as a row vector. For example, if the column coordinate comparison unit a receives a first row comparison result a (for example, i0 is equal to k0, and valid0=1, u0) from the row coordinate comparison circuit 4011, the column coordinate comparison unit a compares a column coordinate of an element value in the row 0 in the first matrix with a column coordinate of an element value in the row 0 in the second matrix. For example, the column coordinate comparison unit a reads the column coordinates j0, j1, j2, and j3 in the row 0 in the first matrix, and reads the column coordinates l0, l1, l2, and l3 in the row 0 in the second matrix. The column coordinate comparison unit a compares the column coordinates (j0, j1, j2, and j3) with the column coordinates (l0, l1, l2, and l3), to obtain a column coordinate comparison result. A representation form of the column coordinate comparison result is similar to a representation form of the row coordinate comparison result.
For example, the column coordinate comparison circuit 4012 outputs the first column comparison result and/or a second column comparison result. The first column comparison result indicates that the two compared column coordinates are equal, and the second column comparison result indicates that no equal column coordinates are matched. The first column comparison result includes the column coordinate comparison result, a second signal (which is denoted as “valid”), and a second value (which is denoted as “v”). The second signal indicates validity of the value v, and the second value indicates a value of a column coordinate. Logic of performing column coordinate comparison by the column coordinate comparison circuit 4012 is similar to logic of performing row coordinate comparison by the row coordinate comparison circuit. Refer to the logic of the row coordinate comparison circuit for understanding. For example, refer to in
For example, the column coordinate comparison circuit receives the following two input arrays:
The column coordinate comparison circuit outputs the following result:
A result of comparing column coordinates in two vectors whose row coordinates are equal (for example, the row 0 in the first matrix and the row 0 in the second matrix) by the column coordinate comparison unit a is used as an example for description. It should be understood that when the column coordinate comparison unit a performs column coordinate comparison, another column coordinate comparison unit (for example, the column coordinate comparison unit b or the column coordinate comparison unit c) also performs row coordinate comparison. For example, if the row coordinate of the row 1 in the first matrix is equal to the row coordinate of the row 1 in the second matrix, the column coordinate comparison unit b may compare column coordinates in the two row vectors: the row 1 in the first matrix and the row 1 in the second matrix. A process in which another column coordinate comparison unit performs row coordinate comparison is similar to the process in which the column coordinate comparison unit a performs row coordinate comparison, and details are not described herein again. Only that the column coordinate comparison unit a compares column coordinates in the row 0 in the first matrix and the row 0 in the second matrix is used as an example for description.
The accumulator 4021 is configured to add, based on the first column comparison result, an element value corresponding to the nth column coordinate of the mth row and an element value corresponding to the lth column coordinate of the fth row, to obtain the third element value. The first column comparison result indicates that the nth column coordinate of the mth row is the same as the lth column coordinate of the fth row. The element value corresponding to the nth column coordinate of the mth row is the first element value, and the element value corresponding to the lth column coordinate of the fth row is the second element value. n is less than or equal to N, and l is less than or equal to L.
For example, the following describes a structure of the accumulator by using an example. As shown in
The accumulator 4021 is further configured to output the third element value and third position coordinates of the third element value to the second cache 404. Optionally, the accumulator 4021 is further configured to output, to the second cache 404 based on the second column comparison result, fourth position coordinates that fail to be matched and a fourth element value corresponding to the fourth position coordinates. Refer to
Optionally, the row coordinate comparison result further includes a second row comparison result, and the second row comparison result indicates that no row coordinate equal to a row coordinate of a row w in the first matrix is found in the second matrix. In this case, the accumulator 4021 is further configured to directly output all element values in the row w in the first matrix and position coordinates of each element value to the second cache 404. The row w is any row in the first matrix. For example, refer to
It should be noted that precision of the element value and the position coordinates is not limited in this embodiment of this application, and the element value and the position coordinates may be of any precision. For example, the precision of the element value is double-precision FP64, and the precision of the position coordinates is precision int32.
Refer to
Based on the computation apparatus provided in this application, great benefits can be obtained in a plurality of matrix computation scenarios. For example, when the computation apparatus is used in an artificial intelligence (artificial intelligence, AI) training and inference scenario, computation of a matrix in a compressed format and a matrix in an uncompressed format can be completely supported. The computation apparatus in this application can directly compute the matrix in the compressed format without performing a decompression operation on the matrix in the compressed format. In this way, computation efficiency can be improved by more than four times. In addition, for a scenario such as scientific computing, regardless of computation that is of a matrix in the uncompressed format and that requires high computing power, or a matrix computation scenario in which a memory bandwidth is limited, when the computation apparatus in this application is used, a matrix in the compressed format can be directly accessed from a memory, so that a computing benefit is improved.
The foregoing describes embodiments of the computation apparatus, and the following describes a method performed by the computation apparatus. Refer to
Step 2101: A computation apparatus obtains a computation instruction, where the computation instruction includes a first vector and a second vector that are in a compressed format.
Step 2102: The computation apparatus compares position coordinates of an element value in the first vector with position coordinates of an element value in the second vector, to obtain a first coordinate comparison result, where the first vector includes a first element value and first position coordinates of the first element value, the second vector includes a second element value and second position coordinates of the second element value, the first coordinate comparison result includes a first comparison result, and the first comparison result indicates that the first position coordinates are the same as the second position coordinates.
Step 2103: The computation apparatus computes the first element value and the second element value based on the first comparison result, to obtain a computation value; and outputs a computation result of the first vector and the second vector to a cache, where the computation result is related to the computation value.
In an optional implementation, the computation instruction is an addition instruction. The computation apparatus adds the first element value and the second element value based on the first comparison result, to obtain a sum of the first element value and the second element value. The computation value is the sum. The computation result includes a third element value and third position coordinates of the third element value. The third element value is the sum. The third position coordinates are the same as the first position coordinates.
Optionally, when the third element value is a zero element value, the computation apparatus outputs an invalid signal for the zero element value. The invalid signal indicates that an element value in the computation result does not include the zero element value and position coordinates corresponding to the zero element value. The computation apparatus skips, based on the invalid signal, outputting the zero element value and the position coordinates corresponding to the zero element value to the cache.
In an optional implementation, the first coordinate comparison result further includes a second comparison result, and the first vector includes a fourth element value and fourth position coordinates of the fourth element value. The computation apparatus outputs the fourth element value and the fourth position coordinates to the cache based on the second comparison result. The computation result includes the fourth element value and the fourth position coordinates.
In an optional implementation, the computation instruction is a multiplication instruction. The computation apparatus multiplies the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result includes a fifth element value and fifth position coordinates of the fifth element value. The fifth element value is the product. The fifth position coordinates are the same as the first position coordinates.
In an optional implementation, the computation instruction is an inner product instruction. The computation apparatus multiplies the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value. The product is the computation value. The computation result is an accumulated value of a plurality of products.
In an optional implementation, the computation instruction is a multiplication-addition instruction. The computation apparatus multiplies the first element value by the second element value based on the first comparison result, to obtain a product of the first element value and the second element value, where the computation value is the product, the product is used as a fifth element value, and fifth position coordinates corresponding to the fifth element value are the same as the first position coordinates;
In an optional implementation, the computation instruction includes a first matrix and a second matrix that are in the compressed format, the first matrix includes the first vector, and the second matrix includes the second vector.
The computation apparatus is further configured to compare position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a second coordinate comparison result, where the second coordinate comparison result includes the first coordinate comparison result.
In an optional implementation, a dimension of the first matrix is M×N, and a dimension of the second matrix is K×L. The comparing position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a second coordinate comparison result may include the following steps:
Refer to
Step 2201: Obtain a computation instruction, where the computation instruction includes a first matrix and a second matrix that are in a compressed format.
For the first matrix and the second matrix in this step, refer to example descriptions of the first matrix and the second matrix in the foregoing apparatus embodiments. Details are not described herein again.
Step 2202: Compare position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a coordinate comparison result, where the first matrix includes a first element value and first position coordinates of the first element value, the second matrix includes a second element value and second position coordinates of the second element value, the coordinate comparison result includes a first comparison result, and the first comparison result indicates that the first position coordinates are the same as the second position coordinates.
For this step, refer to specific descriptions of the function performed by the position coordinate comparison circuit 401 in the foregoing computation apparatus embodiments. Details are not described herein again.
Optionally, a dimension of the first matrix is M×N, and a dimension of the second matrix is K×L. The comparing position coordinates of an element value in the first matrix with position coordinates of an element value in the second matrix, to obtain a coordinate comparison result includes:
Step 2203: Add the first element value and the second element value based on the first comparison result, to obtain a result matrix, where the result matrix includes a third element value and third position coordinates of the third element value, the third element value is a sum of the first element value and the second element value, and the third position coordinates are the same as the first position coordinates.
For this step, refer to specific descriptions of the function performed by the accumulator 4021 in the foregoing computation apparatus embodiments. Details are not described herein again.
Optionally, an element value corresponding to the nth column coordinate of the mth row and an element value corresponding to the lth column coordinate of the fth row are added based on the first column comparison result, to obtain the third element value. The element value corresponding to the nth column coordinate of the mth row is the first element value. The element value corresponding to the lth column coordinate of the fth row is the second element value. n is less than or equal to N, and l is less than or equal to L.
In an optional implementation, when the third element value is a zero element value, the computation apparatus outputs an invalid signal for the zero element value. The invalid signal indicates that an element value in the result matrix does not include the zero element value.
In an optional implementation, the coordinate comparison result further includes a second comparison result. The first matrix includes a fourth element value and fourth position coordinates of the fourth element value. The second comparison result indicates that no position coordinates that are the same as the fourth position coordinates are found in the second matrix. The matrix computation method may further include the following step 2204.
Step 2204: Output the fourth element value and the fourth position coordinates to a cache based on the second comparison result, where the cache is configured to cache the result matrix, the result matrix includes the fourth element value and the fourth position coordinates.
It should be noted that there is no limitation on a time sequence of step 2203 and step 2204, and step 2203 and step 2204 may be performed simultaneously.
In an embodiment of this application, a computation circuit is provided. The computation circuit is configured to perform one or more steps in step 2201 to step 2204 or one or more steps in step 2101 to step 2103 in the foregoing method embodiments. During actual application, the computation circuit may be an ASIC, an FPGA, a logic circuit, or the like.
In another embodiment of this application, a computation system or a chip is further provided. A structure of the system or the chip may be shown in
In still another embodiment of this application, a computation device is provided. A structure of the device may be shown in
The processor 202 may be configured to perform one or more steps in step 2201 to step 2204 or one or more steps in step 2101 to step 2103 in the foregoing method embodiments. In some feasible embodiments, the processor 202 may include a computation unit, and the computation unit may be configured to support the processor in performing one or more steps in the foregoing method embodiments. During actual application, the computation unit may be an ASIC, an FPGA, a logic circuit, or the like. Certainly, the computation unit may alternatively be implemented by using software. This is not specifically limited in this embodiment of this application.
It should be noted that components of the computation circuit, the computation system, the computation device, and the like provided in embodiments of this application are respectively configured to implement functions of corresponding steps in the foregoing method embodiments.
All or some of the foregoing embodiments may be implemented using software, hardware, firmware, or any combination thereof. When software is used to implement embodiments, the foregoing embodiments may be implemented completely or partially in a form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded or executed on a computer, all or some of the processes or the functions according to embodiments of this application are generated. The computer may be a general-purpose computer, a dedicated computer, a computer network, or other programmable apparatuses. The computer instructions may be stored in a computer-readable storage medium or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data center in a wired (for example, a coaxial cable, an optical fiber, or a digital subscriber line (DSL)) or wireless (for example, infrared, radio, or microwave) manner. The computer-readable storage medium may be any usable medium accessible by a computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium. The semiconductor medium may be a solid-state drive (solid-state drive, SSD).
In conclusion, the foregoing embodiments are merely intended for describing the technical solutions of this application, but not for limiting this application. Although this application is described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that they may still make modifications to the technical solutions described in the foregoing embodiments or make equivalent replacements to some technical features thereof, without departing from the scope of the technical solutions of embodiments of this application.
Number | Date | Country | Kind |
---|---|---|---|
202110961228.7 | Aug 2021 | CN | national |
202111349874.4 | Nov 2021 | CN | national |
This application is a continuation of International Application No. PCT/CN2022/085509, filed on Apr. 7, 2022, which claims priority to Chinese Patent Application No. 202110961228.7, filed on Aug. 20, 2021, and Chinese Patent Application No. 202111349874.4, filed on Nov. 15, 2021. All of the aforementioned patent applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2022/085509 | Apr 2022 | WO |
Child | 18440254 | US |