The disclosure relates to computing technology, and more particularly to a bit-serial computing device and a test method for evaluating the same.
Bit-serial computing can be used in neural networks. For bit-serial computing, it is important to enhance output accuracy.
Therefore, an object of the disclosure is to provide a bit-serial computing device and a method for evaluating the same. The computing device can have improved output accuracy.
According to an aspect of the disclosure, the bit-serial computing device includes a computing circuit and a scaler. The computing circuit receives a feed-in multiplier vector and a feed-in multiplicand vector, and includes a number (N) of multiply-and-accumulate (MAC) slices, where N≥2. The feed-in multiplier vector contains a number (M) of multiplier inputs, where M≥2. The feed-in multiplicand vector contains a number (M) of multiplicand inputs, each of which contains a number (N) of multiplicand segments that have different significances. The significances respectively correspond to the MAC slices. Correspondence between the significances and the MAC slices is variable. Each of the MAC slices calculates an inner product of the feed-in multiplier vector and a vector that is constituted by the multiplicand segments of the multiplicand inputs of the feed-in multiplicand vector having the significance corresponding to the MAC slice. The scaler is coupled to the MAC slices to receive the inner products that are respectively calculated by the MAC slices, and further receives a control signal. With respect to each of the MAC slices, the scaler multiplies the inner product that is calculated by the MAC slice by a weighting ratio that represents the significance corresponding to the MAC slice based on the control signal, so as to obtain a scaled inner product that corresponds to the MAC slice.
According to another aspect of the disclosure, the test method is for evaluating the bit-serial computing device described above, and includes steps of: (A) generating at least one first test multiplier vector and at least one second test multiplier vector, where a first linear function of the at least one first test multiplier vector is equal to a second linear function of the at least one second test multiplier vector; (B) sequentially providing the first and second test multiplier vectors to the computing circuit as the feed-in multiplier vector, so that each of the MAC slices sequentially obtains at least one first inner product that corresponds to the at least one first test multiplier vector and at least one second inner product that corresponds to the at least one second test multiplier vector as the inner product calculated thereby; (C) with respect to each of the MAC slices, calculating an absolute deviation that corresponds to the MAC slice, and that equals an absolute value of the first linear function of the at least one first inner product obtained by the MAC slice minus the second linear function of the at least one second inner product obtained by the MAC slice; (D) repeating step (B) and step (C), and with respect to each of the MAC slices, accumulating the absolute deviation that corresponds to the MAC slice, so as to obtain an accumulated deviation that corresponds to the MAC slice; and (E) generating an evaluation output based on the accumulated deviations that respectively correspond to the MAC slices, where the evaluation output indicates a relative relationship of accuracies of the MAC slices, and the accuracy of one of the MAC slices is determined to be higher than the accuracy of another one of the MAC slices when the accumulated deviation that corresponds to said one of the MAC slices is smaller than the accumulated deviation that corresponds to said another one of the MAC slices.
According to yet another aspect of the disclosure, the bit-serial computing device includes a computing circuit, a test pattern generator and an evaluator. The computing circuit includes a MAC slice that calculates an inner product of a feed-in multiplier vector and another vector. The test pattern generator is coupled to the computing circuit, and generates at least one first test multiplier vector and at least one second test multiplier vector, where a first linear function of the at least one first test multiplier vector is equal to a second linear function of the at least one second test multiplier vector. The test pattern generator sequentially provides the first and second test multiplier vectors to the computing circuit as the feed-in multiplier vector, so that the MAC slice sequentially obtains at least one first inner product that corresponds to the at least one first test multiplier vector and at least one second inner product that corresponds to the at least one second test multiplier vector as the inner product calculated by the MAC slice. The evaluator is coupled to the MAC slice to receive the at least one first inner product and the at least one second inner product, calculates an absolute deviation that equals an absolute value of the first linear function of the at least one first inner product minus the second linear function of the at least one second inner product; and increases an accumulated deviation by the absolute deviation.
Other features and advantages of the disclosure will become apparent in the following detailed description of the embodiment(s) with reference to the accompanying drawings. It is noted that various features may not be drawn to scale.
Before the disclosure is described in greater detail, it should be noted that where considered appropriate, reference numerals or terminal portions of reference numerals have been repeated among the figures to indicate corresponding or analogous elements, which may optionally have similar characteristics.
Referring to
The first multiplexer 11 receives a normal multiplier vector, a test multiplier vector and a mode signal (MODE). The normal multiplier vector contains a number (M) of multiplier inputs (AN0-ANM−1), where M≥2. The test multiplier vector contains a number (M) of multiplier inputs (AT0-ATM−1). Each of the multiplier inputs (AN0-ANM−1, AT0-ATM−1) of the normal and test multiplier vectors is at least one bit wide. The first multiplexer 11 outputs the normal multiplier vector as a feed-in multiplier vector when the mode signal (MODE) indicates that the bit-serial computing device 1 of this embodiment operates in the normal mode, and outputs the test multiplier vector as the feed-in multiplier vector when the mode signal (MODE) indicates that the bit-serial computing device 1 of this embodiment operates in the test mode. Therefore, the feed-in multiplier vector contains a number (M) of multiplier inputs (A0-AM−1); and the multiplier input (Am) of the feed-in multiplier vector is equal to the multiplier input (ANm) of the normal multiplier vector when the mode signal (MODE) indicates that the bit-serial computing device 1 of this embodiment operates in the normal mode, and is equal to the multiplier input (ATm) of the test multiplier vector when the mode signal (MODE) indicates that the bit-serial computing device 1 of this embodiment operates in the test mode, where 0≤m≤M−1. It should be noted that, when the bit-serial computing device 1 of this embodiment is used in a neural network, the normal multiplier vector is one of an activation vector and a weight vector.
The second multiplexer 12 receives a normal multiplicand vector, a test multiplicand vector and the mode signal (MODE). The normal multiplicand vector contains a number (M) of multiplicand inputs (WN0—WNM−1). The test multiplicand vector contains a number (M) of multiplicand inputs (WT0-WTM−1). Each of the multiplicand inputs (WN0—WNM−1, WT0-WTM−1) of the normal and test multiplicand vectors is at least N bits wide, where N≥2. The second multiplexer 11 outputs the normal multiplicand vector as a feed-in multiplicand vector when the mode signal (MODE) indicates that the bit-serial computing device 1 of this embodiment operates in the normal mode, and outputs the test multiplicand vector as the feed-in multiplicand vector when the mode signal (MODE) indicates that the bit-serial computing device 1 of this embodiment operates in the test mode. Therefore, the feed-in multiplicand vector includes a number (M) of multiplicand inputs (W0-WM−1); and the multiplicand input (Wm) of the feed-in multiplicand vector is equal to the multiplicand input (WNm) of the normal multiplicand vector when the mode signal (MODE) indicates that the bit-serial computing device 1 of this embodiment operates in the normal mode, and is equal to the multiplicand input (WTm) of the test multiplicand vector when the mode signal (MODE) indicates that the bit-serial computing device 1 of this embodiment operates in the test mode, where OmM−1. In addition, each of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector contains a number (N) of multiplicand segments (W0,0−W0,N−1, . . . , or WM−1,0−WM−1,N−1), each of which is at least one bit wide, and which have different significances. The multiplicand segments (W0,0−WM−1,0, . . . , or W0,N−1—WM−1,N−1) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector have the same significance. The significance of the multiplicand segments (W0,n—WM−1,n) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector is greater than the significance of the multiplicand segments (W0,n−1—WM−1,n−1) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector, where 1≤n≤N−1. It should be noted that, when the bit-serial computing device 1 of this embodiment is used in the neural network, the normal multiplicand vector is the other one of the activation vector and the weight vector.
The computing circuit 14 is coupled to the first multiplexer 11 to receive the feed-in multiplier vector, and at least includes a number (N) of multiply-and-accumulate (MAC) slices (MAC0−MACN−1).
The first allocator 13 is coupled to the second multiplexer 12 to receive the feed-in multiplicand vector, is further coupled to the MAC slices (MAC0-MACN−1), and further receives a control signal (CTRL1). With respect to each of the significances, the first allocator 13 outputs the multiplicand segments (W0,0—WM−1,0, . . . , or W0,N−1—WM−1,N−1) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector that have the significance for receipt by a corresponding one of the MAC slices (MAC0−MACN—1) based on the control signal (CTRL1). The significances respectively correspond to the MAC slices (MAC0-MACN—1). Correspondence between the significances and the MAC slices (MAC0−MACN—1) is variable, and is indicated by the control signal (CTRL1). The first allocator 13 may be implemented using a number (N2) of switches that are arranged in an N×N crossbar configuration.
Each of the MAC slices (MAC0−MACN—1) calculates an inner product of the feed-in multiplier vector and a vector that is constituted by the multiplicand segments (W0,0—WM−1,0, . . . , or W0,N−1—WM−1,N−1) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector received thereby, which is equal to Σ0M−1Am·Wm,n, where 0≤n≤N−1.
The scaler 15 is coupled to the MAC slices (MAC0−MACN—1) to receive the inner products that are respectively calculated by the MAC slices (MAC0−MACN—1), and further receives a control signal (CTRL2). With respect to each of the MAC slices (MAC0−MACN—1), the scaler 15 multiplies the inner product that is calculated by the MAC slice (MAC0/ . . . /MACN—1) by a weighting ratio (R0/ . . . /RN-1) that represents the significance corresponding to the MAC slice (MAC0/ . . . /MACN—1) based on the control signal (CTRL2), so as to obtain a scaled to inner product that corresponds to the MAC slice (MAC0/ . . . /MACN—1) and that is equal to Rn·Σ0M−1Am·Wm,n, where 0≤n·N−1. In an example where each of the multiplicand segments (W0,0—W0,N−1, . . . , WM−1,0—WM−1,N−1) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector is one bit wide, the weighting ratio (Rn) may be 2n. In another example where each of the multiplicand segments (W0,0—W0,N−1, . . . , WM−1,0—WM−1,N−1) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector is two bits wide, the weighting ratio (Rn) may be 4n. In yet another example where each of the multiplicand segments (W0,0—W0,N−1, . . . , WM−1,0—WM−1,N−1) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector is three bits wide, the weighting ratio (Rn) may be 8n. However, the disclosure is not limited to these examples.
The adder 16 is coupled to the scaler 15 to receive the scaled inner products that respectively correspond to the MAC slices (MAC0−MACN—1), and adds the scaled inner products together to obtain an inner product of the feed-in multiplier vector and the feed-in multiplicand vector, which is equal to Σ0M−1Am·(Σ0N−1Rn·Wm,n)
The test pattern generator 17 is coupled to the first and second multiplexers 11, 12, and generates the test multiplier vector to be received by the first multiplexer 11 and the test multiplicand vector to be received by the second multiplexer 12.
The evaluator 18 is coupled to the MAC slices (MAC0−MACN—1) to receive the inner products that are respectively calculated by the MAC slices (MAC0−MACN−1), and generates an evaluation output that indicates a relative relationship of accuracies of the MAC slices (MAC0−MACN—1) based on the inner products.
The configurator 19 is coupled to the evaluator 18 to receive the evaluation output, is further coupled to the first allocator 13 and the scaler 15, and generates the control signal (CTRL1) to be received by the first allocator 13 and the control signal (CTRL2) to be received by the scaler 15 based on the evaluation output.
In this embodiment, initially, the configurator 19 generates the control signal (CTRL1) corresponding to that of the first allocator 13 which outputs the multiplicand segments (W0,n—WM−1,n) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector to the MAC slice (MACn), and generates the control signal (CTRL2) corresponding to that of the scaler 15 which multiplies the inner product calculated by the MAC slice (MACn) by the weighting ratio (Rn), where 0≤n≤N−1. Then, after the evaluator 18 generates the evaluation output, level one reordering is performed, so that output accuracy of the bit-serial computing device 1 can be enhanced. That is, the configurator 19 generates the control signal (CTRL1) corresponding to that of the first allocator 13 which outputs the multiplicand segments (W0,N−1—WM−1,N−1) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector having a largest one of the significances to the one of the MAC slices (MAC0−MACN—1) having the highest accuracy among all of the MAC slices (MAC0−MACN—1), and generates the control signal (CTRL2) corresponding to that of the scaler 15 which multiplies the inner product calculated by said one of the MAC slices (MAC0−MACN—1) by the weighting ratio (RN−1) representing the largest one of the significances. In an example where M=16, where N=8, and where the evaluation output indicates that the MAC slice (MAC1) has the highest accuracy among all of the MAC slices (MAC0−MAC7), the control signal (CTRL1) may correspond to that of the first allocator 13 which outputs the multiplicand segments (W0,7—W15,7) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC1), outputs the multiplicand segments (W0,1—W15,1) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC7), and outputs the multiplicand segments (W0,n—W15,n) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MACn), where n=0, 2, 3, 4, 5 or 6.
In another embodiment, after the evaluator 18 generates the evaluation output, level two reordering is performed, so that the output accuracy of the bit-serial computing device 1 can be further enhanced as compared to the previous embodiment where the level one reordering is performed. That is, the configurator 19 generates the control signal (CTRL1) further corresponding to that of the first allocator 13 which outputs the multiplicand segments (W0,N−2—WM−1,N−2) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector having a second largest one of the significances to another one of the MAC slices (MAC0-MACN−1) having the second highest accuracy among all of the MAC slices (MAC0-MACN−1), and generates the control signal (CTRL2) further corresponding to that of the scaler 15 which multiplies the inner product calculated by said another one of the MAC slices (MAC0−MACN—1) by the weighting ratio (RN−2) representing the second largest one of the significances. In an example where M=16, where N=8, and where the evaluation output indicates that the MAC slice (MAC1) has the highest accuracy among all of the MAC slices (MAC0−MAC7) and that the MAC slice (MAC3) has the second highest accuracy among all of the MAC slices (MAC0−MAC7), the control signal (CTRL1) may correspond to that of the first allocator 13 which outputs the multiplicand segments (W0,7—W15,7) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC1), outputs the multiplicand segments (W0,6—W15,6) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC3), outputs the multiplicand segments (W0,3—W15,3) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC6), outputs the multiplicand segments (W0,1—W15,1) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC7), and outputs the multiplicand segments (W0,n—W15,n) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MACn), where n=0, 2, 4 or 5.
In yet another embodiment, after the evaluator 18 generates the evaluation output, level three reordering is performed, so that the output accuracy of the bit-serial computing device 1 can be further enhanced as compared to the previous embodiment where the level two reordering is performed. That is, the configurator 19 generates the control signal (CTRL1) further corresponding to that of the first allocator 13 which outputs the multiplicand segments (W0,N−3—WM−1,N−3) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector having a third largest one of the significances to yet another one of the MAC slices (MAC0−MACN−1) having the third highest accuracy among all of the MAC slices (MAC0−MACN—1), and generates the control signal (CTRL2) further corresponding to that of the scaler 15 which multiplies the inner product calculated by said yet another one of the MAC slices (MAC0−MACN—1) by the weighting ratio (RN−3) representing the third largest one of the significances. In an example where M=16, where N=8, and where the evaluation output indicates that the MAC slice (MAC1) has the highest accuracy among all of the MAC slices (MAC0−MAC7), that the MAC slice (MAC3) has the second highest accuracy among all of the MAC slices (MAC0−MAC7), and that the MAC slice (MAC4) has the third highest accuracy among all of the MAC slices (MAC0−MAC7), the control signal (CTRL1) may correspond to that of the first allocator 13 which outputs the multiplicand segments (W0,7—W15,7) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC1), outputs the multiplicand segments (W0,6—W15,6) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC3), outputs the multiplicand segments (W0,5—W15,5) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC4), outputs the multiplicand segments (W0,4—W15,4) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC5), outputs the multiplicand segments (W0,3—W15,3) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC6), outputs the multiplicand segments (W0,1—W15,1) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MAC7), and outputs the multiplicand segments (W0,n—W15,n) of the multiplicand inputs (W0-W15) of the feed-in multiplicand vector to the MAC slice (MACn), where n=0 or 2.
It should be noted that, in other embodiments, even higher level reordering may be performed, so that the output accuracy of the bit-serial computing device 1 can be further enhanced as compared to the embodiment where the level three reordering is performed.
In still another embodiment, after the evaluator 18 generates the evaluation output, a predetermined reordering is performed, so that the output accuracy of the bit-serial computing device 1 can be enhanced. That is, the configurator 19 generates the control signal (CTRL1) corresponding to that of the first allocator 13 which outputs the multiplicand segments (W0,0−WM−1,0) of the multiplicand inputs (W0−WM−1) of the feed-in multiplicand vector having a smallest one of the significances to one of the MAC slices (MAC0−MACN—1) having the lowest accuracy among all of the MAC slices (MAC0−MACN—1), and generates the control signal (CTRL2) corresponding to that of the scaler 15 which multiplies the inner product calculated by said one of the MAC slices (MAC0−MACN—1) by the weighting ratio (R0) representing the smallest one of the significances. In an example where M=16, where N=8, and where the evaluation output indicates that the MAC slice (MAC2) has the lowest accuracy among all of the MAC slices (MAC0−MAC7), the control signal (CTRL1) may correspond to that of the first allocator 13 which outputs the multiplicand segments (W0,0−W15,0) of the multiplicand inputs (W0−W15) of the feed-in multiplicand vector to the MAC slice (MAC2), outputs the multiplicand segments (W0,2−W15,2) of the multiplicand inputs (W0−W15) of the feed-in multiplicand vector to the MAC slice (MAC0), and outputs the multiplicand segments (W0,n−W15,n) of the multiplicand inputs (W0−W15) of the feed-in multiplicand vector to the MAC slice (MACn), where n=1, 3, 4, 5, 6 or 7.
In step 21, the test pattern generator 17 generates at least one first test multiplier vector, at least one second test multiplier vector and a test multiplicand vector, where a first linear function of the at least one first test multiplier vector (e.g., a1·x1+a2·x2+ . . . +aI·xI, where a1, a2, . . . , and aI are coefficients, x1, x2, . . . , and xI are the first test multiplier vectors, and I≥1) is equal to a second linear function of the at least one second test multiplier vector (e.g., b1·y1+b2·y2+ . . . +bJ·yJ, where b1, b2, . . . , and bJ are coefficients, y1, y2, . . . , and yJ are the second test multiplier vectors, and J≤1). When a plurality of first test multiplier vectors are generated, the first test multiplier vectors may be different from each other, or at least two of the first multiplier vectors may be identical. Similarly, when a plurality of second test multiplier vectors are generated, the second test multiplier vectors may be different from each other, or at least two of the second multiplier vectors may be identical. In a first example, a first test multiplier vector (x1) and two second test multiplier vectors (y1, y2) are generated, and x1=y1+y2. In a second example, two first test multiplier vectors (x1, x2) and a second test multiplier vector (y1) are generated, and 2·x1+x2=y1. In a third example, three test multiplier vectors (x1, x2, x3) and a second test multiplier vector (y1) are generated, x1=x3, and x1+x2+x3=y1. However, the disclosure is not limited to these examples.
In step 22, the test pattern generator 17 sequentially provides the first and second test multiplier vectors to the first multiplexer 11, and provides the test multiplicand vector to the second multiplexer 12. As a consequence, the first and second test multiplier vectors sequentially pass through the first multiplexer 11 to serve as the feed-in multiplier vector that is to be received by the computing circuit 14, the test multiplicand vector passes through the second multiplexer 12 to serve as the feed-in multiplicand vector that is to be received by the first allocator 13, and each of the MAC slices (MAC0−MACN—1) sequentially obtains at least one first inner product that corresponds to the at least one first test multiplier vector and at least one second inner product that corresponds to the at least one second test multiplier vector as the inner product calculated thereby. In the aforesaid first example, the MAC slice (MACn) sequentially obtains a first inner product (<x1,wn>) and two second inner products (<y1,wn>, <y2,wn>), where wn denotes the vector that is constituted by the multiplicand segments (W0,n—WM−1,n) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector, and 0≤n≤N−1. In the aforesaid second example, the MAC slice (MACn) sequentially obtains two first inner products (<x1,wn>, <x2,wn>) and a second inner product (<y1,wn>). In the aforesaid third example, the MAC slice (MACn) sequentially obtains three first inner products (<x1,wn>, <x2,wn>, <x3,wn>) and a second inner product (<y1,wn>).
In step 23, with respect to each of the MAC slices (MAC0−MACN—1), the evaluator 18 calculates an absolute deviation that corresponds to the MAC slice (MAC0/ . . . /MACN—1), and that equals an absolute value of the first linear function of the at least one first inner product obtained by the MAC slice (MAC0/ . . . /MACN—1) minus the second linear function of the at least one second inner product obtained by the MAC slice (MAC0/ . . . /MACN—1). In the aforesaid first example, the absolute deviation corresponding to the MAC slice (MACn) is equal to |<x1,wn>−(<y1,wn>+<y2,wn>)|. In the aforesaid second example, the absolute deviation corresponding to the MAC slice (MACn) is equal to |(2·<x1,wn>+<x2,wn>)−<y1,wn>|. In the aforesaid third example, the absolute deviation corresponding to the MAC slice (MACn) is equal to |(<x1,wn>+<x2,wn>+<x3,wn>)−<y1,wn>|.
In step 24, with respect to each of the MAC slices (MAC0−MACN—1), the evaluator 18 increases an accumulated deviation corresponding to the MAC slice (MAC0/ . . . /MACN—1) by the absolute deviation corresponding to the MAC slice (MAC0/ . . . /MACN—1). It should be noted that the accumulated deviation corresponding to the MAC slice (MAC0/ . . . /MACN—1) is set to be zero before the test method is to be performed.
In step 25, the evaluator 18 determines whether a combination of steps 21-24 has been executed for a predetermined number of times (e.g., one-hundred times or more). If affirmative, the flow proceeds to step 26. Otherwise, the flow goes back to step 21.
By virtue of steps 24, 25, steps 21-23 are repeated, and with respect to each of the MAC slices (MAC0−MACN—1), the absolute deviation corresponding to the MAC slice (MAC0/ . . . /MACN—1) is accumulated to obtain the accumulated deviation corresponding to the MAC slice (MAC0/ . . . /MACN—1).
In step 26, the evaluator 18 generates the evaluation output based on the accumulated deviations that respectively correspond to the MAC slices (MAC0−MACN−1), where the accuracy of one of the MAC slices (MAC0−MACN−1) is determined to be higher than the accuracy of another one of the MAC slices (MAC0-MACN−1) when the accumulated deviation that corresponds to said one of the MAC slices (MAC0−MACN−1) is smaller than the accumulated deviation that corresponds to said another one of the MAC slices (MAC0−MACN−1).
It should be noted that, in a first modification of this embodiment, in step 26, the evaluator 18 may calculate, with respect to each of the MAC slices (MAC0−MACN−1), an average deviation corresponding to the MAC slice (MAC0/ . . . /MACN—1) based on the accumulated deviation corresponding to the MAC slice (MAC0/ . . . /MACN—1), and may generate the evaluation output based on the average deviations that respectively correspond to the MAC slices (MAC0−MACN−1). In a second modification of this embodiment, the generation of the test multiplicand vector in step 21 and the providing of the test multiplicand vector to the second multiplexer 12 in step 22 may be omitted, and the multiplicand vector that has been stored in the MAC slices (MAC0−MACN—1) may be used by the MAC slices (MAC0−MACN—1) to calculate the inner products. In a third modification of this embodiment, step 21 may be executed once, instead of repeatedly. That is, if the determination in step 25 is negative, the flow goes back to step 22, instead of step 21.
Optionally, in the second exemplary implementation of the computing circuit 14, the ADC 147 of each of the MAC slices (MAC0−MACN—1) is further coupled to the configurator 19, and converts the combination of the currents into the inner product based on at least one reference voltage. Based on the evaluation output, the configurator 19 adjusts the at least one reference voltage used by the ADC 147 of each of some of the MAC slices (MAC0−MACN—1) to downwardly shift an output range of said each of some of the MAC slice (MAC0-MACN−1) by a predetermined value (i.e., an upper limit and a lower limit of the inner product obtained by said each of some of the MAC slice (MAC0−MACN—1) are each decreased by the predetermined value), so as to mitigate impact of noise on the inner product calculated by said each of some of the MAC slice (MAC0−MACN—1). The predetermined value is, for example, one or two. More specifically, by doing so, with respect to each of said some of the MAC slices (MAC0−MACN—1), the MAC slice (MAC0/ . . . /MACN—1) preserves the negative noises that occur during inner product calculation instead of cutting off the negative noises at the lower limit without the downward shift. Let us take an inner product of non-negative vectors for example. Since the vectors used by the MAC slice (MAC0/ . . . /MACN—1) are non-negative, the lower limit of the inner product obtained by the MAC slice (MAC0/ . . . /MACN−1) is zero. However, because of the existence of negative noises, sometimes the ADC 147 of the MAC slice (MAC0/ . . . /MACN—1) may receive a negatively deviated current that falls within an input current range corresponding to an output code of minus one. Naive design cut off the output code of the ADC at zero. Instead, the ADC 147 of the MAC slice (MAC0/ . . . /MACN—1) of the disclosure can output minus one. By doing so, the preserved negative noises can cancel out positive noises that occur at other inner products. In an example, the configurator 19 adjusts the at least one reference voltage used by the ADC 147 of one of the MAC slices (MAC0−MACN—1) having the lowest accuracy among all of the MAC slices (MAC0−MACN—1) in such a way that, for each output code of the ADC 147 of said one of the MAC slices (MAC0−MACN—1), an input current range of the ADC 147 of said one of the MAC slices (MAC0−MACN—1) corresponding to the output code after the adjustment, is identical to an input current range of the ADC 147 of said one of the MAC slices (MAC0−MACN—1) corresponding to the output code minus the predetermined value before the adjustment. In a scenario where the inner product calculated by each of the MAC slices (MAC0−MACN—1) is eight bits wide, where the predetermined value is one, and where the multiplier inputs (A0-AM−1) and the multiplicand segments (W0,0—WM−1,0, . . . , or W0,N−1—WM−1,N−1) of the multiplicand inputs (W0-WM−1) received by each of the MAC slices (MAC0−MACN—1) are all non-negative integers, the output range of the MAC slice (MAC0/ . . . /MACN—1) having the lowest accuracy is originally [0, 255], and will become [−1, 254] after being downwardly shifted. In another example, the configurator 19 adjusts the reference voltages used by the ADCs 147 of two of the MAC slices (MAC0−MACN—1) having the lowest and second lowest accuracies among all of the MAC slices (MAC0−MACN—1). It should be noted that, in other examples, the configurator 19 may adjust the reference voltages used by the ADCs 147 of three or more of the MAC slices (MAC0−MACN—1) having the three or more lowest accuracies among all of the MAC slices (MAC0−MACN—1). Alternatively, the ADCs 147 of the MAC slices (MAC0−MACN−1) are not coupled to the configurator 19, and the reference voltages used by the ADCs 147 of the MAC slices (MAC0−MACN−1) are properly selected in a design phase of the bit-serial computing device 1 such that the output range of each of the MAC slices (MAC0−MACN—1) is [−1, 254].
Referring to
It should be noted that, in other implementations of the computing circuit 14, each of the DACs 140 may convert the multiplier input (A0/ . . . /AM−1) of the feed-in multiplier vector received thereby into an analog current or a time interval, instead of the analog voltage; and with respect to each of the MAC slices (MAC0-MACN—1), the memory cells 146 may not be resistive, and may convert the outputs of the DACs 140 respectively into a plurality of voltage or a plurality of time intervals, instead of the currents, and the ADC 147 may convert a combination of the outputs of the memory cells 146 into the inner product calculated by the MAC slice (MAC0/ . . . /MACN—1). For instance, the memory cells 146 can be based on SRAM cells.
In view of the above, in this embodiment, by virtue of causing the MAC slice (MAC0/ . . . /MACN—1) that has the highest accuracy to calculate the inner product of the feed-in multiplier vector and the vector that is constituted by the multiplicand segments (W0,N−1—WM−1,N−1) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector having the largest one of the significances, or by virtue of causing the MAC slice (MAC0/ . . . /MACN—1) that has the lowest accuracy to calculate the inner product of the feed-in multiplier vector and the vector that is constituted by the multiplicand segments (W0,0—WM−1,0) of the multiplicand inputs (W0-WM−1) of the feed-in multiplicand vector having the smallest one of the significances, the output accuracy of the bit-serial computing device can be enhanced. Moreover, in the second exemplary implementation of the computing circuit 14, by virtue of downwardly shifting the output range of at least one of the MAC slices (MAC0−MACN—1), the impact of the noise on the inner product calculated by the at least one of the MAC slices (MAC0−MACN—1) can be mitigated. In addition, by virtue of the test method calculating and accumulating the absolute deviation for each of the MAC slices (MAC0−MACN—1), the relative relationship of the accuracies of the MAC slices (MAC0−MACN—1) can be determined.
In the description above, for the purposes of explanation, numerous specific details have been set forth in order to provide a thorough understanding of the embodiment(s). It will be apparent, however, to one skilled in the art, that one or more other embodiments may be practiced without some of these specific details. It should also be appreciated that reference throughout this specification to “one embodiment,” “an embodiment,” an embodiment with an indication of an ordinal number and so forth means that a particular feature, structure, or characteristic may be included in the practice of the disclosure. It should be further appreciated that in the description, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of various inventive aspects; such does not mean that every one of these features needs to be practiced with the presence of all the other features. In other words, in any described embodiment, when implementation of one or more features or specific details does not affect implementation of another one or more features or specific details, said one or more features may be singled out and practiced alone without said another one or more features or specific details. It should be further noted that one or more features or specific details from one embodiment may be practiced together with one or more features or specific details from another embodiment, where appropriate, in the practice of the disclosure.
While the disclosure has been described in connection with what is(are) considered the exemplary embodiment(s), it is understood that this disclosure is not limited to the disclosed embodiment(s) but is intended to cover various arrangements included within the spirit and scope of the broadest interpretation so as to encompass all such modifications and equivalent arrangements.
This application claims the benefit of U.S. Provisional Patent Application No. 63/302134, filed on Jan. 24, 2022, which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63302134 | Jan 2022 | US |