This application claims the priority benefit of Taiwan application serial no. 111139781, filed on Oct. 20, 2022. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
The disclosure relates to a computing device and an operation method for the computing device, and particularly relates to a matrix computing device and an operation method for the matrix computing device.
It should be noted that a vector direction of the matrix MA and a vector direction of the matrix MB are different from each other based on the matrix multiplication. That is, a reading order of element values in the matrix MB is different from the reading order of element values in the matrix MA. Generally speaking, an arrangement order of element values of a matrix is to arrange element rows first. When a matrix computing device consumes the arrangement of an element row, the matrix computing device consumes the arrangement of a next element row. The reading order of element values of a matrix is to read element rows first. However, based on the matrix multiplication, the reading order of the element values of the matrix MB is to read element columns first. When the matrix computing device consumes the arrangement of an element column, the matrix computing device consumes the arrangement of a next element column.
The matrix computing device performs an additional transpose computation on the matrix MB by circuits or algorithms. Therefore, the cost of the matrix computing device increases.
The disclosure provides a matrix computing device and an operation method which can avoid transpose computations.
The matrix computing device according to an embodiment of the disclosure includes a storage unit, a control circuit, and a computing circuit. The storage unit includes a weight matrix. The control circuit is coupled to the storage unit. The control circuit re-orders an arrangement order of a plurality of weights in the weight matrix according to a shape of an output matrix to determine a weight readout order of the weights. The weight readout order is different from the arrangement order of the weights in the weight matrix. The computing circuit is coupled to the control circuit. The computing circuit receives the weights based on the weight readout order, and performs a matrix computation on the weights and an input matrix to generate a computing matrix. The control circuit performs a reshape transformation on the computing matrix to generate the output matrix, and writes the output matrix to the storage unit.
The operation method according to an embodiment of the disclosure is provided for a matrix computing device. The matrix computing device includes a storage unit and a computing circuit. The operation method includes: re-ordering an arrangement order of a plurality of weights in a weight matrix of the storage unit according to a shape of an output matrix to determine a weight readout order of the weights, and the weight readout order being different from the arrangement order of the weights in the weight matrix; receiving the weights based on the weight readout order by the computing circuit, and performing a matrix computation on the weights and an input matrix to generate a computing matrix; and performing a reshape transformation on the computing matrix to generate the output matrix, and writing the output matrix to the storage unit.
Based on the above, the matrix computing device and the operation method re-order the arrangement order of the weights in the weight matrix according to the shape of the output matrix to determine the weight readout order of the weights. The computing circuit performs the matrix computation on the weights and the input matrix based on the weight readout order to generate the computing matrix. It should be noted that the weight readout order changes the arrangement order of elements of the computing matrix. The arrangement order of the elements of the computing matrix helps achieve a transpose of a matrix when performing the reshape transformation. Therefore, the matrix computing device does not need to perform an additional transpose to achieve the transpose computation on the matrix. Therefore, the cost of the matrix computing device of the disclosure is not increased.
In order to make the above-mentioned features and advantages of the disclosure more comprehensible, the following embodiments are given and described in detail with the accompanying drawings as follows.
Some embodiments of the disclosure will be described in detail below with reference to the accompanying drawings. Reference numerals quoted in the following description will be regarded as the same or similar components when the same reference numerals appear in different drawings. The embodiments are merely a part of the disclosure and do not disclose all possible implementations of the disclosure. More specifically, the embodiments are only examples within the scope of the claims.
Please refer to
In the embodiment, the control circuit 120 is coupled to the storage unit 110. The control circuit 120 re-orders an arrangement order of the weights W11˜WNM according to a shape of an output matrix MO to determine a weight readout order ORD of the weights W11˜WNM. The output matrix MO is, for example, a two-dimensional matrix with T rows and S columns (the disclosure is not limited thereto). In the embodiment, S and T are positive integers greater than 1, respectively.
In the embodiment, when the weights W11˜WNM are written to the storage unit 110, the weights W11˜WNM are written in a row major first. That is, the weights W11˜W1M are sequentially written to a first row of the weight matrix MW. Next, the weights W21˜W2M are sequentially written to a second row of the weight matrix MW, and so on. Therefore, in a direction of the weight matrix MW, the weights W11˜W1M, the weights W21˜W2M, . . . , and the weights WN1˜WNM are arranged sequentially. Through re-ordering, the weight readout order ORD of the weights W11˜WNM is different from the arrangement order of the weights W11˜WNM in the weight matrix MW. For example, the control circuit 120 may read out the weights W11˜W1M first, then read out the weights W31˜W3M, and then read out the weights W21˜W2M based on the weight readout order ORD.
In the embodiment, the computing circuit 130 is coupled to the control circuit 120. The computing circuit 130 receives the weights W1˜WNM based on the weight readout order ORD. Therefore, a row order of the weights W11˜WNM received by the computing circuit 130 is different from the row order of the weights W11˜WNM in the weight matrix MW. The computing circuit 130 also receives an input matrix MI, and performs a matrix computation on the weights W11˜WNM and the input matrix MI to generate a computing matrix MC. In the embodiment, the input matrix MI is, for example, a one-dimensional matrix having M rows and 1 column (the disclosure is not limited thereto). Therefore, the input matrix MI includes input element values IN1˜INM. The computing circuit 130 performs a matrix multiplication computation on the weights W11˜WNM and the input matrix MI to generate the computing matrix MC. Therefore, the computing matrix MC is a one-dimensional matrix with N rows and 1 column (the disclosure is not limited thereto). The computing matrix MC includes computing element values E1˜EN.
The control circuit 120 performs a reshape transformation on the computing matrix MC to generate the output matrix MO. The control circuit 120 writes the output matrix MO to the storage unit 110. The control circuit 120 increases a dimension of the computing matrix MC to generate the output matrix MO. In the embodiment, the control circuit 120 converts the dimension of the computing matrix MC from one-dimensional to two-dimensional, thereby generating the output matrix MO. For example, the control circuit 120 reads out the computing element values E1˜EN sequentially, and writes the computing element values E1˜EN to the output matrix MO in a row major first. Therefore, the matrix MO includes the computing element values E11˜ETS. It should be understood that the computing element value E11 is equal to E1. The computing element value ETS is equal to EN.
It is worth mentioning that the control circuit 120 re-orders the arrangement order of the weights W11˜WNM in the weight matrix MW according to the shape of the output matrix MO to determine the weight readout order ORD of the weights W11˜WNM. The computing circuit 130 performs the matrix computation on the weights W11˜WNM and the input matrix MI based on the weight readout order ORD to generate the computing matrix MC. It should be noted that the weight readout order ORD changes the arrangement order of the computing element values E1˜EN of the computing matrix MC. The arrangement order of the computing element values E1˜EN helps achieve a transpose of a matrix when performing the reshape transformation. In this way, the matrix computing device 100 does not need to perform an additional transpose to achieve a transpose computation on the computing matrix MC or the output matrix MO. The cost of the matrix computing device 100 is not increased.
In the embodiment, the control circuit 120 may be implemented by a logic circuit, a memory controller, an input/output (I/O) buffer, or a central processing unit (CPU). In the embodiment, the computing circuit 130 may be applicable to the matrix computation of a neural network (NN).
In some embodiments, the input matrix MI may be provided by an external device. In some embodiments, the input matrix MI may be provided by the storage unit 110.
For the convenience of explanation, the weight matrix MW is exemplified as a two-dimensional array. The input matrix MI is exemplified by a one-dimensional array. However, the disclosure is not limited thereto. In some embodiments, the weight matrix MW may be a one-dimensional array of multiple rows and one column. The input matrix MI may be a two-dimensional array.
Please refer to
In the embodiment, the output matrix MO is, for example, a two-dimensional matrix having T rows and S columns. The control circuit 120 takes a first weight row of the weight matrix MW as a first readout row RO1, and takes an (nT+1)th weight row of the weight matrix MW as an (n+1)th readout column RO(n+1). n is smaller than S. The control circuit 120 takes a second weight row of the weight matrix MW as an (S+1)th readout row RO(S+1), and takes an (nT+2)th weight row of the weight matrix MW as an (S+n+1)th readout row (not shown). Therefore, a readout matrix MW′ generated based on the weight readout order ORD is formed. In other words, the control circuit 120 converts the weight matrix MW to the readout matrix MW′ according to the weight readout order ORD. The first readout row RO1 includes the weights W11˜W1M. A second readout row RO2 includes the weights W(T+1)1˜W(T+1)M (i.e., n=1). A third readout row RO3 includes the weights W(2T+1)1˜W(2T+1)M (i.e., n=2). The (S+1)th readout row RO(S+1) includes the weights W21˜W2M.
An arrangement of the weights W11˜WNM received by the computing circuit 130 based on the weight readout order ORD is equivalent to an aspect of the readout matrix MW′. The computing circuit 130 performs the multiplication computation on the readout matrix MW′ and the input matrix MI to generate the computing matrix MC. The computing element value E1 is equal to a multiply-accumulate (MAC) value of the weights W11˜W1M of the first readout row RO1 and the input element values IN1˜INM. The computing element value E2 is equal to the MAC value of the weights W21˜W2M of the second readout row RO2 and the input element values IN1˜INM, and so on. The computing element values E1 and E2 are shown in formula (1) and formula (2) respectively as follows.
E
1=ΣK=1MW1K×INK Formula (1)
E
2=ΣK=1MW2K×INK Formula (2)
The control circuit 120 receives the computing matrix MC and converts the dimension of the computing matrix MC from one-dimensional to two-dimensional to generate the output matrix MO. It should be noted that the weight readout order ORD changes the arrangement order of the computing element values E1˜EN of the computing matrix MC. The arrangement order of the computing element values E1˜EN helps achieve the transpose of a matrix when performing the reshape transformation.
In some embodiments, the control circuit 120 stores the readout matrix MW′ to the storage unit 110. Therefore, in a case where the weights W11˜WNM are not updated, the control circuit 120 can read the readout matrix MW′ without performing a re-ordering operation. In some embodiments, the readout matrix MW′ and the weight matrix MW are respectively stored in different segments of the storage unit 110. In some embodiments, when the readout matrix MW′ is stored in the storage unit 110, the readout matrix MW′ covers the weight matrix MW.
For example, please refer to
Please refer to
Taking the embodiment as an example, the MAC circuit 231(1) receives a corresponding weight row (i.e., the first readout row RO1) through the channel CH(1). Therefore, the MAC circuit 231(1) receives the weights W11˜W1M sequentially through the channel CH(1), and performs a MAC computation on the weights W11˜W1M and the input matrix MI to generate the computing element value E1 of the computing matrix MC. The MAC circuit 231(2) receives the corresponding weight row (i.e., the second readout row RO2) through the channel CH(2). Therefore, the MAC circuit 231(2) receives the weights W(T+1)1˜W(T+1)M sequentially through the channel CH(2), and performs the MAC computation on the weights W(T+1)1˜W(T+1)M and the input matrix MI to generate the computing element value E2 of the computing matrix MC. Similarly, the MAC circuit 231(N) receives the weights WN1˜WNM of the N-th readout row RON through the channel CH(N), and performs the MAC computation on the weights WN1˜WNM and the input matrix MI to generate the computing element value EN of the computing matrix MC.
Taking the MAC circuit 231(1) as an example, the MAC circuit 231(1) includes a multiplier MU, a register RG, and an adder AD. The register RG stores the computing element value E1 at a first time. At this time, the computing element value E1 may be an initial value (e.g., “0”). The multiplier MU is coupled to the channel CH(1) and the input matrix MI. The multiplier MU receives the weight W11 and input data IN1 in the input matrix MI at the first time, and performs the multiplication computation on the weight W11 and the input data IN1 to generate a product value MV. The adder AD receives the computing element value E1 stored in the register RG and the product value MV from the multiplier MU at the first time. The adder AD performs an addition computation on the computing element value E1 and the product value MV to generate a new computing element value E1, and stores the new computing element value E1 in the register RG. At a second time, the multiplier MU receives the weight W12 and the input data IN2 in the input matrix MI, and performs a multiplication computation on the weight W12 and the input data IN2 to generate a new product value MV. The adder AD receives the new product value MV and the computing element value E1 stored in the register RG at the first time. The adder AD performs an addition computation on the computing element value E1 and the new product value MV to generate another new computing element value E1, and so on.
In the embodiment, the circuit configuration of the MAC circuits 231(2)˜231(N) is similar to the circuit configuration of the MAC circuit 231(1), and will not be repeated here.
Please refer to
In step S120, the computing circuit 130 receives the weights W11˜WNM based on the weight readout order ORD, and performs a matrix computation on the weights W11˜WNM and the input matrix MI to generate a computing matrix MC.
In step S130, the control circuit 120 performs the reshape transformation on the computing matrix MC to generate an output matrix MO, and writes the output matrix MO to the storage unit 110. The implementation details of steps S110 to S130 have been described in the embodiments of
To sum up, the matrix computing device and the operation method re-order the arrangement order of the weights in the weight matrix according to the shape of the output matrix to determine the weight readout order of the weights. The computing circuit performs the matrix computation on the weights and the input matrix based on the weight readout order to generate the computing matrix. The weight readout order changes the element arrangement order of the computing matrix. The element arrangement order of the computing matrix helps achieve the transpose of a matrix when performing the reshape transformation. Therefore, the matrix computing device does not need to perform an additional transpose to achieve the transpose computation on the matrix. The cost of the matrix computing device of the disclosure is not increased.
Although the disclosure has been disclosed with reference to the embodiments above, they are not intended to limit the disclosure. Anyone with ordinary knowledge in the technical field can make some changes and modifications without departing from the spirit and scope of the disclosure. Therefore, the protection scope of the disclosure shall be determined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
111139781 | Oct 2022 | TW | national |
Number | Date | Country | |
---|---|---|---|
20240134931 A1 | Apr 2024 | US |