Aspects of the present disclosure relate generally to analog-to-digital conversion, and more particularly to converting a sum of bits with an analog-to-digital converter.
An artificial intelligence (AI) accelerator or another type of processor may include multiply and accumulate circuits for performing multiply and accumulate (MAC) operations. A multiply and accumulate circuit may include a set of multipliers for performing multiple multiplications (e.g., bit-wise multiplications) in parallel, and an analog-to-digital converter for converting a sum of the resulting products into a digital signal.
The following presents a simplified summary of one or more implementations in order to provide a basic understanding of such implementations. This summary is not an extensive overview of all contemplated implementations and is intended to neither identify key or critical elements of all implementations nor delineate the scope of any or all implementations. Its sole purpose is to present some concepts of one or more implementations in a simplified form as a prelude to the more detailed description that is presented later.
A first aspect relates to a system. The system includes first multipliers configured to perform multiplications on a first set of bits and a second set of bits to generate first products, and second multipliers configured to perform multiplications on a third set of bits and a fourth set of bits to generate second products. The system also includes a parity compare circuit coupled to the first multipliers and the second multipliers, wherein the parity compare circuit is configured to generate a parity compare signal indicating whether a number of ones in the first products and a number of ones in the second products have a same parity or different parities. The system also includes a conversion circuit configured to change a bit value of one of the second products if the parity compare signal indicates the number of ones in the first products and the number of ones in the second products have the different parities. The system further includes a first summer configured to sum the first products to generate a first sum, a second summer configured to sum the second products to generate a second sum, a switching circuit coupled to the first summer and the second summer, an analog-to-digital converter (ADC) coupled to the switching circuit, and a shift and add circuit coupled to the ADC.
A second aspect relates to a method for multiplication and accumulation. The method includes performing multiplications on a first set of bits and a second set of bits to generate first products, and performing multiplications on a third set of bits and a fourth set of bits to generate second products. The method also includes summing the first products to generate a first sum, changing a bit value of one of the second products, and summing the second products to generate a second sum. The method further includes averaging the first sum and the second sum to obtain an average of the first sum and the second sum, converting the average of the first sum and the second sum into a digital signal, and shifting and adding a one to the digital signal.
A third aspect relates to a machine learning accelerator. The machine learning accelerator includes a memory, and a multiply and accumulate array coupled to the memory, wherein the multiply and accumulate array includes multiply and accumulate circuits. Each of the multiply and accumulate circuits includes respective first multipliers configured to perform multiplications on a respective first set of bits and a respective second set of bits to generate respective first products, and respective second multipliers configured to perform multiplications on a respective third set of bits and a respective fourth set of bits to generate respective second products. Each of the multiply and accumulate circuits also includes a respective parity compare circuit coupled to the respective first multipliers and the respective second multipliers, wherein the respective parity compare circuit is configured to generate a respective parity compare signal indicating whether a number of ones in the respective first products and a number of ones in the respective second products have a same parity or different parities. Each of the multiply and accumulate circuits also includes a respective conversion circuit configured to change a bit value of one of the respective second products if the respective parity compare signal indicates the number of ones in the respective first products and the number of ones in the respective second products have the different parities. Each of the multiply and accumulate circuits further includes a respective first summer configured to sum the respective first products to generate a respective first sum, a respective second summer configured to sum the respective second products to generate a respective second sum, a respective switching circuit coupled to the respective first summer and the respective second summer, a respective analog-to-digital converter (ADC) coupled to the respective switching circuit, and a respective shift and add circuit coupled to the respective ADC.
The detailed description set forth below, in connection with the appended drawings, is intended as a description of various configurations and is not intended to represent the only configurations in which the concepts described herein may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the various concepts. However, it will be apparent to those skilled in the art that these concepts may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring such concepts.
The multiply and accumulate circuit 110 includes 2n bit-wise multipliers 120-1 to 120-256, a summer 150, and an n-bit analog-to-digital converter (ADC) 170. In the example in
Each of the multipliers 120-1 to 120-256 has a respective first input 122-1 to 122-256, a respective second input 124-1 to 124-256, and a respective output 126-1 to 126-256. The first input 122-1 to 122-256 of each of the multipliers 120-1 to 120-256 is configured to receive a respective one of the bits of the first set of 2n bits, and the second input 124-1 to 124-256 of each of the multipliers 120-1 to 120-256 is configured to receive a respective one of the bits of the second set of 2n bits. Each of the multipliers 120-1 to 120-256 is configured to perform bit-wise multiplication on the respective bits to generate a respective one-bit product. As a result, the multipliers 120-1 to 120-256 output 2n one-bit products at the outputs 126-1 to 126-256.
In certain aspects, each of the multipliers 120-1 to 120-256 is configured to output a voltage approximately equal to zero volts (i.e., ground potential) to represent a product of zero (i.e., logic zero), and output a voltage approximately equal to Vdd to represent a product of one (i.e., logic one), where Vdd is a supply voltage.
The summer 150 is configured to sum the 2n one-bit products from the multipliers 120-1 to 120-256 in the analog voltage domain, as discussed further below. In the example in
In operation, the summer 150 is configured to generate a voltage at the summing node 160 approximately equal to the following:
where sum is the sum of the 2n one-bit products, Vdd is the supply voltage, and voltage_sum is the voltage at the summing node 160. For example, if half the products have a bit value of one, then the voltage at the summing node 160 is approximately equal to half Vdd, which represents a sum of 128 for the example where n is eight.
The ADC 170 has an input 172 and an output 174, in which the input 172 is coupled to the summing node 160 of the summer 150. The ADC 170 may be implemented with a successive approximation register (SAR) ADC or another type of ADC. The ADC 170 is configured to convert the voltage at the summing node 160 into an n-bit digital signal indicating the sum of the products of the multipliers 120-1 to 120-256. For example, if the products include 128 ones (i.e., half the products are one), then the voltage at the summing node 160 is equal to half Vdd and the n-bit digital signal output by the ADC 170 indicates a sum of 128.
It is desirable to reduce the power consumption of the multiply and accumulate circuit 110. For example, reducing the power consumption of the multiply and accumulate circuit 110 can lead to a significant reduction in a system including many instances of the multiply and accumulate circuit 110.
One option to reduce power consumption is to implement the ADC 170 with an (n−1) ADC instead of an n-bit ADC. This reduces the power consumption of the ADC 170, which reduces the overall power consumption of the multiply and accumulate circuit 110. However, implementing the ADC 170 with an (n−1) ADC in the exemplary architecture shown in
Multiple instances (i.e., copies) of the multiply and accumulate circuit 210 may be used to perform multiply and accumulate (MAC) operations in an artificial intelligence (AI) accelerator. For example, the AI accelerator may include a large array of MAC circuits to perform a large number of MAC operations (e.g., to run AI models). However, it is to be appreciated that the multiply and accumulate circuit 210 is not limited to AI accelerators.
In the example in
In the example in
Each of the multipliers 220-1 to 220-128 has a respective first input 222-1 to 222-128, a respective second input 224-1 to 224-128, and a respective output 226-1 to 226-128. The first input 222-1 to 222-128 of each of the multipliers 220-1 to 220-128 is configured to receive a respective one of the bits of the first set of bits, and the second input 224-1 to 224-128 of each of the multipliers 220-1 to 220-128 is configured to receive a respective one of the bits of the second set of bits. Each of the multipliers 220-1 to 220-128 is configured to perform multiplication on the respective bits to generate a respective one-bit product. The multipliers 220-1 to 220-128 output the first set of 2(n−1) products (i.e., first products) at the outputs 226-1 to 226-128. In this example, each of the multipliers 220-1 to 220-128 may output of voltage approximately equal to Vdd to represent a product with a bit value of one, and output a voltage approximately equal to zero volts (i.e., ground potential) to represent a product with a bit value of zero.
Each of the multipliers 230-1 to 230-128 has a respective first input 232-1 to 232-128, a respective second input 234-1 to 234-128, and a respective output 236-1 to 236-128. The first input 232-1 to 232-128 of each of the multipliers 230-1 to 230-128 is configured to receive a respective one of the bits of the third set of bits, and the second input 234-1 to 234-128 of each of the multipliers 230-1 to 230-128 is configured to receive a respective one of the bits of the fourth set of bits. Each of the multipliers 230-1 to 230-128 is configured to perform multiplication on the respective bits to generate a respective one-bit product. The multipliers 230-1 to 230-128 output the second set of 2(n−1) products (i.e., second products) at the outputs 236-1 to 236-128. In this example, each of the multipliers 230-1 to 230-128 may output of voltage approximately equal to Vdd to represent a product with a bit value of one, and output a voltage approximately equal to zero volts (i.e., ground potential) to represent a product with a bit value of zero.
The first summer 250 is configured to sum the first set of 2(n−1) products (i.e., first products) from the multipliers 220-1 to 220-128 to produce a first sum in the analog voltage domain, and the second summer 256 is configured to sum the second set of 2(n−1) products (i.e., second products) from the multipliers 230-1 to 230-128 to produce a second sum in the analog voltage domain, as discussed further below.
In the example in
where sum1 is the first sum, Vdd is the supply voltage, and voltage_sum1 is the voltage at the first summing node 262 representing the first sum in the analog voltage domain. In this example, the voltage at the first summing node 262 has 2(n−1) possible voltage levels representing 2(n−1) possible values for the first sum.
In the example in
where sum2 is the second sum, Vdd is the supply voltage, and voltage_sum2 is the voltage at the second summing node 264 representing the second sum in the analog voltage domain. In this example, the voltage at the second summing node 264 has 2(n−1) possible voltage levels representing 2(n−1) possible values for the second sum.
The switching circuit 280 has a first terminal 282 coupled to the first summing node 262, a second terminal 284 coupled to the second summing node 264, and a third terminal 286. The switching circuit 280 may be implemented with multiple switches, as discussed further below. The ADC 270 has an input 272 coupled to the third terminal 286 of the switching circuit 280, and an output 274. The shift circuit 290 has an input 292 coupled to the output 274 of the ADC 270, and an output 294. As discussed further below, the shift circuit 290 is configured to multiply the digital signal from the ADC 270 by two (e.g., by shifting the digital signal from the ADC 270 to the left by one bit position). An exemplary implementation of the ADC 270 is discussed below.
During operation, after the first sum appears on the first summing node 262 and the second sum appears on the second summing node 264, the switching circuit 280 couples the first summing node 262 to the second summing node 264. This causes an average of the first sum and the second sum to appear at the third terminal 286. The average of the first sum and the second sum is equal to the sum of the 2n products (i.e., the sum of the first set of 2(n−1) products and the second set of 2(n−1) products) divided by two. Note that the average of the first sum and the second sum is represented in the analog voltage domain at the third terminal 286 as:
where voltage_average is the voltage at the third terminal 286 representing the average of the first sum and the second sum in the analog voltage domain.
The ADC 270 converts the average of the first sum and the second sum into a digital signal indicating the average of the first sum and the second sum. The shift circuit 290 then multiplies the digital signal from the ADC 270 by two to generate the n-bit digital signal discussed above. Since the digital signal from the ADC 270 indicates the average of the first sum and the second sum (which is equal to the sum of the 2n products divided by two), multiplying the digital signal from the ADC 270 by two causes the n-bit digital signal to indicate the sum of the 2n products (i.e., the sum of the two 2(n−1) products).
In certain aspects, the multiply and accumulate circuit 210 generates the first set of 2(n−1) products and the first sum during a first cycle of a clock signal, and generates the second set of 2(n−1) products and the second sum during a second cycle of the clock signal. In these aspects, the first sum and the second sum are accumulated over two cycles of the clock signal, and the n-bit digital signal indicating the sum of the 2n products (i.e., the sum of the two 2(n−1) products) is generated every other cycle of the clock signal, which reduces power consumption. The first cycle and the second cycle are discussed further below according to certain aspects.
As discussed above, the exemplary multiply and accumulate circuit 210 is able to accurately convert the sum of the 2n products (i.e., the sum of the two 2(n−1) products) into the n-bit digital signal when the sum is even (i.e., the sum is a sum of an even number of ones). This is because, when the sum of the 2n products is even, the average of the first sum and the second sum is an integer, which can be accurately converted into a digital signal with a resolution of (n−1) bits. For example, if the first sum is 2 and the second sum is 4, then the sum of the 2n products is 6 and the average of the first sum and the second sum is (4+2)/2=3, which is an integer. In this example, the ADC 270 accurately converts the average into a digital signal of 0000011 at the output 274 of the ADC 270, which indicates the correct average of 3. The shift circuit 290 multiples the digital signal from the ADC 270 by two by shifting the digital signal to the left by one bit position. The shift produces a digital signal of 00000110, which indicates the current sum of 6.
The sum of the 2n products is even when the first sum and the second sum have the same parity (i.e., the first sum and the second sum are both even or the first sum and the second sum are both odd). Thus, the exemplary multiply and accumulate circuit 210 is able to accurately convert the sum of the 2n products into the n-bit digital signal when the first sum and the second sum have the same parity.
However, the exemplary multiply and accumulate circuit 210 is not able to accurately convert the sum of the 2n products in cases where the sum is odd. This is because, when the sum of the 2n products is odd, the average of the first sum and the second sum is a non-integer with a fractional part of 0.5, which cannot be accurately converted with a resolution of (n−1) bits. For example, if the first sum is 1 and the second sum is 2, then the sum of the 2n products is 3 and the average of the first sum and the second sum is (1+2)/2=1.5, which is a non-integer. In this example, the ADC 270 will incorrectly convert the average into 1 or 2, which will result in an incorrect sum of 2 or 4 at the output 294 of the shift circuit 290.
The sum of the 2n products is odd when the first sum and the second sum have different parities (i.e., the first sum is odd and the second sum is even, or the first sum is even and the second sum is odd). Thus, the exemplary multiply and accumulate circuit 210 is not able to accurately convert the sum of the 2n products into the n-bit digital signal when the first sum and the second sum have the different parities.
To address the above, aspects of the present disclosure provide the multiply and accumulate circuit 210 with odd exception handling capability that allows the multiply and accumulate circuit 210 to accurately convert the sum of the 2n products into the n-bit digital signal when the sum is odd (i.e., the first sum and the second sum have the different parities), as discussed further below.
The parity compare circuit 310 is coupled to the outputs 226-1 to 226-128 of the multipliers 220-1 to 220-128 and the outputs 236-1 to 236-128 of the multipliers 230-1 to 230-128. The parity compare circuit 310 is configured to detect a parity of the number of ones in the first set of 2(n−1) products (i.e., number of ones in the first products), detect a parity of the number of ones in the second set of 2(n−1) products (i.e., number of ones in the second products), and output a parity compare signal at the output 312 indicating whether the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity or different parties. For example, the parity compare circuit 310 may output a one if the number of ones in the first set of 22(n−1) products and the number of ones in the second set of 2(n−1) products have different parities, and output a zero if the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity, or vice versa.
In certain aspects, the parity compare circuit 310 determines the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity if the number of ones in the first set of 2(n−1) products is even and the number of ones in the second set of 2(n−1) products is even, or the number of ones in the first set of 2(n−1) products is odd and the number of ones in the second set of 2(n−1) products is odd. The parity compare circuit 310 determines the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the different parities if the number of ones in the first set of 2(n−1) products is even and the number of ones in the second set of 2(n−1) products is odd, or the number of ones in the first set of 2(n−1) products is odd and the number of ones in the second set of 2(n−1) products is even.
The parity compare signal indicates whether odd exception handling is needed. As discussed further below, odd exception handling is needed when the parity compare signal indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have different parities.
The conversion circuit 320 is coupled between the outputs 236-1 to 236-128 of the multipliers 230-1 to 230-128 and the second summer 256. The conversion circuit 320 also has an input 322 coupled to the output 312 of the parity compare circuit 310 to receive the parity compare signal from the parity compare circuit 310.
If the parity compare signal indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity, then the conversion circuit 320 passes the second set of 2(n−1) products at the outputs 236-1 to 236-128 of the multipliers 230-1 to 230-128 to the second summer 256 unchanged. In this case, the sum of the 2n products (i.e., sum of the two 2(n−1) products) is even and the odd exception handling is not needed.
If the parity compare signal indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have different parities, then the conversion circuit 320 changes the bit value of one of the products in the second set of 2(n−1) products (i.e., one of the second products) from one to zero (e.g., changes the voltage of the product from Vdd to zero volts) before passing the second set of 2(n−1) products to the second summer 256. The bit value change causes the first sum and the second sum to have the same parity, and thus, the sum of the 2n products to be even. This causes the average of the first sum and the second sum to be an integer, which allows the ADC 270 to accurately convert the average of the first sum and the second sum at the input 272 into a digital signal at the output 274.
The shift and add circuit 330 has an input 332 coupled to the output 274 of the ADC 270 and an output 334. In the example in
If the parity compare signal indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have different parities, then the shift and add circuit 330 shifts the digital signal from the ADC 270 by one bit position to multiply the digital signal by two, and adds one to the shifted digital signal. In this case, the one is added to the shifted digital signal to undo the bit value change of one of the products in the set of 2(n−1) products from one to zero by the conversion circuit 320.
If the parity compare signal indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity, then the shift and add circuit 330 shifts the digital signal from the ADC 270 by one bit position to multiply the digital signal by two without adding one to the shifted digital signal.
The parity compare circuit 310, the conversion circuit 320, and the shift and add circuit 330 allow the exemplary multiply and accumulate circuit 210 to accurately convert the sum of the 2n products (i.e., sum of two 2(n−1) products) in cases where the sum is odd. For example, if the first sum is 1 and the second sum is 2, then the parity compare signal indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have different parities. In response, the conversion circuit 320 converts the bit value of one of the products in the second set of 2(n−1) products from one to zero, which changes the second sum from 2 to 1. This also changes the sum of the 2n products (i.e., sum of the two 2(n−1) products) from 3 to 2, which is even. After the change, the average of the first sum and the second sum is (1+1)/2=1, which is an integer. In this example, the ADC 270 correctly converts the average into a digital signal indicating 1. The shift and add circuit 330 doubles the digital signal and adds a one, resulting in a digital signal at the output 334 that indicates the correct sum of 3.
At block 410, the parity compare circuit 310 compares the parity of the number of ones in the first set of 2(n−1) products with the parity of the number of ones in the second set of 2(n−1) products.
If the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity at block 415 (which indicates that the sum of the 2n products is even), then the method proceeds to block 420.
At block 420, the first sum and the second sum are averaged in the analog voltage domain. For example, the switching circuit 280 may average the first sum and the second sum by coupling the first summing node 262 to the second summing node 264.
At block 425, the ADC 270 converts the average of the first sum and the second sum into a digital signal at the output 274 with a bit resolution of (n−1) bits.
At block 430, the digital signal from the ADC 270 is doubled (i.e., multiplied by two). For example, the shift and add circuit 330 may double the digital signal by shifting the digital signal to the left by one bit position. The digital signal after the doubling provides a digital representation of the sum of the 2n products (i.e., sum of two 2(n−1) products).
If the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the different parities at block 415 (which indicates that the sum of the 2n products is odd), then the method proceeds to block 435.
At block 435, the conversion circuit 320 converts the bit value of one of the products from one to zero. This change causes the first sum and the second sum to have the same parity, and thus, the sum of the 2n products to be even.
At block 440, the first sum and the second sum are averaged in the analog voltage domain. For example, the switching circuit 280 may average the first sum and the second sum by coupling the first summing node 262 to the second summing node 264.
At block 445, the ADC 270 converts the average of the first sum and the second sum into a digital signal at the output 274 with a bit resolution of (n−1) bits.
At block 450, the digital signal from the ADC 270 is doubled (i.e., multiplied by two). For example, the shift and add circuit 330 may double the digital signal by shifting the digital signal to the left by one bit position.
At block 455, the shift and add circuit 330 adds a one to the digital signal to undo the bit value change in block 435. In some implementations, the shift and add circuit 330 may perform the shifting and the addition of one concurrently. The digital signal after the doubling and the addition of one provides a digital representation of the sum of the 2n products (i.e., sum of two 2(n−1) products).
In the example in
In this example, the input 332 of the shift and add circuit 330 includes (n−1) parallel inputs 332-1 to 332-7 coupled to the (n−1) outputs 274-1 to 274-7 of the ADC 270, respectively. The output 334 of the shift and add circuit 330 includes n parallel outputs 334-1 to 334-8. The digital signal that is output by the shift and add circuit 330 includes n bit, which are labeled out<0> to out<7> in
In this example, the shift and add circuit 330 shifts the digital signal from the ADC 270 by one position by mapping the bits d<0> to d<6> of the digital signal to higher order bits out<1> to out<7>, respectively, at the outputs 334-2 to 334-8 of the shift and add circuit 330. For example, the LSB bit d<0> in the digital signal is mapped to the bit out<1> at the output 334-2, which is one order higher than the LSB bit. In other words, each of the bits d<0> to d<6> is mapped to a respective one of the bits out<1> to out<7> that is one order higher. In the example in
In this example, the shift and add circuit 330 includes a multiplexer 510 having a first input 512, a second input 514, a select input 518, and an output 516. The first input 512 is held at the bit value of zero (e.g., ground) and the second input 514 is held at the bit value of one (e.g., a voltage approximately equal to Vdd). The output 516 of the multiplexer 510 is coupled to the output 334-1 of the shift and add circuit 330, which corresponds to the LSB out<0>. Thus, the output 516 of the multiplexer 510 provides the LSB out<0> of the output digital signal out<0> to <7>. The select input 518 receives the parity compare signal from the parity compare circuit 310 (shown in
The multiplexer 510 is configured to select one of the inputs 512 and 514 based on the parity compare signal. If the parity compare signal indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity, then the multiplexer 510 selects the first input 512 and outputs the bit value of zero for the LSB out<0>. In this case, the output digital signal out<0> to out<7> is double the digital signal d<0> to d<6>. Thus, in this case, the shift and add circuit 330 multiplies the digital signal from the ADC 270 by two.
If the parity compare signal indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have different parities, then the multiplexer 510 selects the second input 514 and outputs the bit value of one for the LSB out<0>. In this case, the output digital signal out<0> to out<7> is double the digital signal d<0> to d<6> plus one. Thus, in this case, the shift and add circuit 330 multiplies the digital signal from the ADC 270 by two and adds one to the digital signal.
In the example shown in
The timing circuit 695 is configured to receive the clock signal clk, and generate a first cycle clock signal clk_1 and a second cycle clock signal clk_2 based on the clock signal clk.
In this example, the first set of bits and the second set of bits are received via the inputs 610-1 to 610-128 and the inputs 650-1 to 650-128, respectively, during the first cycle of the clock signal. The third set of bits and the fourth set of bits are received via the inputs 610-1 to 610-128 and the inputs 650-1 to 650-128, respectively, during the second cycle of the clock signal.
Each of the latches 620-1 to 620-128 has a respective input (labeled “in”) coupled to a respective one the inputs 610-1 to 610-128, and a respective output (labeled “out”) coupled to the first input 222-1 to 220-128 of a respective one of the multipliers 220-1 to 220-128. The latches 620-1 to 620-128 are configured to receive the first cycle clock signal clk_1 and latch the first set of bits at the respective inputs on a rising edge of the first cycle clock signal clk_1.
Each of the latches 630-1 to 630-128 has a respective input (labeled “in”) coupled to a respective one the inputs 610-1 to 610-128, and a respective output (labeled “out”) coupled to the first input 232-1 to 232-128 of a respective one of the multipliers 230-1 to 230-128. The latches 630-1 to 630-128 are configured to receive the second cycle clock signal clk_2 and latch the third set of bits at the respective inputs on a rising edge of the second cycle clock signal clk_2.
Each of the latches 660-1 to 660-128 has a respective input (labeled “in”) coupled to a respective one the inputs 650-1 to 650-128, and a respective output (labeled “out”) coupled to the second input 224-1 to 224-128 of a respective one of the multipliers 220-1 to 220-128. The latches 660-1 to 660-128 are configured to receive the first cycle clock signal clk_1 and latch the second set of bits at the respective inputs on a rising edge of the first cycle clock signal clk_1.
Each of the latches 670-1 to 670-128 has a respective input (labeled “in”) coupled to a respective one the inputs 650-1 to 650-128, and a respective output (labeled “out”) coupled to the second input 234-1 to 234-128 of a respective one of the multipliers 230-1 to 230-128. The latches 670-1 to 670-128 are configured to receive the second cycle clock signal clk_2 and latch the fourth set of bits at the respective inputs on a rising edge of the second cycle clock signal clk_2.
Exemplary operations of the multiply and accumulate circuit 210 shown in
During the first cycle of the clock signal clk (labeled “Cycle1” in
During the second cycle of the clock signal clk (labeled “Cycle2” in
During the second cycle (labeled “Cycle2”), the first switch 680 and the second switch 690 are both initially turned off (i.e., open) during a first portion (labeled “t1”) of the second cycle. During the first portion of the second cycle, the second summer 256 generates the second sum on the second summing node 264 in the analog voltage domain, as discussed above. After the first portion of the second cycle has elapsed, the first switch 680 is turned on (i.e., closed) during a second portion (labeled “t2”) of the second cycle. This causes the first switch 680 to couple the first summing node 262 to the second summing node 264, which produces the average of the first sum and the second sum in the analog voltage domain. During a third portion (labeled “t3”) of the second cycle, the second switch 690 is turned on (i.e., closed). This causes the second switch 690 to couple the average of the first sum and the second sum to the input 272 of the ADC 270 (shown in
The multiply and accumulate circuit 210 may repeat the above operations over a third cycle (labeled “Cycle3”) and a fourth cycle (labeled “Cycle4”) of the clock signal clk to multiply and accumulate new sets of bits. The multiply and accumulate circuit 210 may include reset circuitry (not shown) for resetting the capacitors 255-1 to 255-128 and 260-1 to 260-128 (e.g., discharging the capacitors 255-1 to 255-128 and 260-1 to 260-128) for the third cycle and the fourth cycle.
During the third cycle, the ADC 270 may convert the average of the first sum and the second sum that was sampled during the second cycle into a digital signal at the output 274 of the ADC 270. The shift and add circuit 330 may then convert the digital signal from the ADC 270 into the final n-bit digital signal at the output 334, as discussed above. Note that, during the first cycle (labeled “Cycle1”), the ADC 270 may convert the average of the first sum and the second sum that was sampled during a previous cycle (i.e., a cycle preceding the first cycle) into a digital signal.
In this example, the first parity detector 810 has inputs 812-1 to 812-128 coupled to respective outputs 226-1 to 226-128 of the multipliers 220-1 to 220-128, and an output 814. The first parity detector 810 is configured to detect the parity of the number of ones in the first set of 2(n−1) products and output a signal at the output 814 indicating the detected parity of the number of ones in the first set of 2(n−1) products. For example, the first parity detector 810 may output a one when the number of ones in the first set of 2(n−1) products is odd and output a zero when the number of ones in the first set of 2(n−1) products is even, or vice versa.
The second parity detector 820 has inputs 822-1 to 822-128 coupled to respective outputs 236-1 to 236-128 of the multipliers 230-1 to 230-128, and an output 824. The second parity detector 820 is configured to detect the parity of the number of ones in the second set of 2(n−1) products and output a signal at the output 824 indicating the detected parity of the number of ones in the second set of 2(n−1) products. For example, the second parity detector 820 may output a one when the number of ones in the second set of 2(n−1) products is odd and output a zero when the number of ones in the second set of 2(n−1) products is even, or vice versa.
The latch 830 has an input (labeled “in”) coupled to the output 814 of the first parity detector, and an output (labeled “out”). The latch 830 is configured to receive the second cycle clock signal clk_2, latch the signal from the first parity detector 810 on a rising edge of the second cycle clock signal clk_2, and output the latched signal at the output of the latch 830.
The XOR gate 840 has an first input 842 coupled to the output of the latch 830, a second input 844 coupled to the second parity detector 820, and an output 846 coupled to the output 312 of the parity compare circuit 310. In this example, the first parity detector 810 and the second parity detector 820 output the same logic value (i.e., one or zero) when the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity. In this case, the XOR gate 840 outputs a zero at the output 312. The first parity detector 810 and the second parity detector 820 output different logic values when the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have different parities. In this case, the XOR gate 840 outputs a one at the output 312. Thus, in this example, the XOR gate 840 outputs a zero when the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity, and outputs a one when the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have different parities. However, it is to be appreciated that the present disclosure is not limited to this example.
When the parity compare signal indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity, each of the select circuits 920-1 to 920-128 is configured to couple the output 236-1 to 236-128 of the respective one of the multipliers 230-1 to 230-128 to the respective one of the capacitors 260-1 to 260-128 of the second summer 256. In this case, the select circuits 920-1 to 920-128 pass the second set of 2(n−1) products to the capacitors 260-1 to 260-128 of the second summer 256 unchanged.
When the parity compare signal indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have different parities, the select circuits 920-1 to 920-128 change the bit value of one of the products in the second set of 2(n−1) products from one to zero based on the outputs of the lagging one detector 910 and pass the second set of 2(n−1) products after the bit value change to the capacitors 260-1 to 260-128 of the second summer 256, as discussed further below.
The lagging one detector 910 is configured to detect one of the products in the second set of 2(n−1) products having a bit value of one, and output a one at the respective one of the outputs 914-1 to 914-128. This causes the respective one of the select circuits 920-1 to 920-128 to change the bit value of the detected product from one to zero. The lagging one detector 910 outputs a zero at each of the remaining outputs 914-1 to 914-128, which causes each of the respective select circuits 920-1 to 920-128 to pass the respective product unchanged.
In the example in
Each of the multiplexers 940-1 to 940-128 is configured to select the respective first input (labeled “0”) when the parity compare signal from the parity compare circuit 310 indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have the same parity. This causes the multiplexers 940-1 to 940-128 to pass the second set of 2(n−1) products to the second summer 256 unchanged.
Each of the multiplexers 940-1 to 940-128 is configured to select the respective second input (labeled “1”) when the parity compare signal from the parity compare circuit 310 indicates the number of ones in the first set of 2(n−1) products and the number of ones in the second set of 2(n−1) products have different parities. In this case, the NOR gate 930-1 to 930-128 coupled to the output 914-1 to 914-128 of the lagging one detector 910 outputting a one outputs a zero to the second input (labeled “1”) of the respective multiplexer 940-1 to 940-128. This causes the respective multiplexer 940-1 to 940-128 to output a zero, which effectively changes the bit value of the corresponding product from one to zero. The remaining select circuits 920-1 to 920-128 pass the remaining products of the second set of 2(n−1) products to the second summer 256 unchanged.
In this example, the lagging one detector 910 is configured to detect one of the products in the second set of 2(n−1) products having a bit value of one, output a one at the respective one of the outputs 914-1 to 914-128, and output a zero at each of the remaining outputs 914-1 to 914-128. For example, if the first product received at the input 912-1 is a one, then the lagging one detector 910 outputs a one at the corresponding output 914-1 and outputs a zero at each of the remaining outputs 914-2 to 914-128. If the first product received at the input 912-1 is zero and the second product received at the input 912-2 is one, then the lagging one detector 910 outputs a one at the corresponding output 914-2 and outputs a zero at each of the remaining outputs 914-1 and 914-3 to 914-128. If the first product received at input 912-1 is zero, the second product received at the input 912-2 is zero, and the third product received at the input 912-3 is one, then the lagging one detector 910 outputs a one at the corresponding output 914-3 and outputs a zero at each of the remaining outputs 914-1, 914-2, and 914-4 to 914-128, and so forth.
In the example in
The second sum circuit 1130 has a first input 1132 coupled to the output 1124 of the first sum circuit 1120, a second input 1134, and an output 1136. The register 1140 has an input 1142 coupled to the output 1136 of the second sum circuit 1130, and an output 1144 coupled to the second input 1134 of the second sum circuit 1130. The output 1144 of the register 1140 provides the output of the multiply and accumulate system 1110.
In this example, the second sum circuit 1130 is configured to sum the sum from the output 1124 of the first sum circuit 1120 with the output of the register 1140, and output the resulting sum to the register 1140. The register 1140 stores the sum output from the second sum circuit 1130 and outputs the sum from the second sum circuit 1130 at the output 1144, which is fed back to second input 1134 of the second sum circuit 1130. As a result, the second sum circuit 1130 and the register 1140 form an accumulator that accumulates the sums output from the output 1124 of the first sum circuit 1120. In this regard, the register 1140 may also be referred to as an accumulation register since the register 1140 stores the accumulation of the sums from the first sum circuit 1120.
It is to be appreciated that the multiply and accumulate circuit 210 is not limited to the exemplary multiply and accumulate system 1110 shown in
The memory 1215 may be used to store weights, activation values, and the results of processing by the machine learning accelerator 1210 (e.g., accumulations of products of weights and activation values). The first registers 1220 are coupled between the memory 1215 and the MAC array 1230, and the second registers 1225 are coupled between the memory 1215 and the MAC array 1230. The first registers 1220 are configured to receive activation values from the memory 1215 and input the activation values to the MAC array 1230, and the second registers 1225 are configured to receive weights from the memory 1215 and input the weights to the MAC array 1230.
The MAC array 1230 may include an array (e.g., 32×64 array) of multiple instances of the multiply and accumulate system 1110 illustrated in
The scale bias and non-linear circuit 1240 is coupled between the MAC array 1230 and the memory 1215. The scale bias and non-linear circuit 1240 may be configured to add a constant to the product of activation values and weights to offset the corresponding result by a minimum threshold. This helps ensure that the result can be output to the next AI model layer for cases where values below the threshold do not produce an output to the next AI model layer. Scaling may be needed in certain aspects to fit the data within a specific scale for improving the accuracy of results. The scale bias and non-linear circuit 1240 may also be configured to perform non-linear functions to enable a neural network to learn more complex relationships between the inputs and outputs and improve the accuracy and effectiveness of the neural network.
The machine learning accelerator 1210 may be used for computer vision inferencing in some implementations. For example, the memory 1215 may be coupled to an imaging device 1250 configured to capture an image for processing by the machine learning accelerator 1210. In this example, the memory 1215 stores the image, which may include a set of image values (e.g., pixel values). The memory 1215 outputs the image values to the first registers 1220, which inputs the image values to the MAC array 1230. In this example, the image values provide the activation values discussed above.
In this example, the weights discussed above may be the weights of a filter stored in the memory 1215. The memory 1215 outputs the weights of the filter to the second registers 1225, which inputs the weights to the MAC array 1230. The MAC array 1230 performs matrix multiplication on the image values and the weights of the filter to perform image inferencing. The image inferencing may be used may be, for example, to categorize one or more objects in the image. It is to be appreciated that the machine learning accelerator 1210 is not limited to image inferencing and may be used for other types of inferencing.
At block 1310, multiplications are performed on a first set of bits and a second set of bits to generate first products. For example, the multiplications on the first set of bits and the second set of bits may be performed by the multipliers 220-1 to 220-128. In certain aspects, the multiplications may be bit-wise multiplications and the products may be one-bit products. The first products may correspond to the first set of 2(n−1) products.
At block 1320, multiplications are performed on a third set of bits and a fourth set of bits to generate second products. For example, the multiplications on the third set of bits and the fourth set of bits may be performed by the multipliers 230-1 to 230-128. In certain aspects, the multiplications may be bit-wise multiplications and the products may be one-bit products. The second products may correspond to the second set of 2(n−1) products.
At block 1330, the first products are summed to generate a first sum. For example, the first products may be summed by the first summer 250.
At block 1340, a bit value of one of the second products is changed. For example, the bit value may be changed by the conversion circuit 320. The bit value change may be from one to zero. The one of the second products may correspond to one of the products in the second set of 2(n−1) products.
At block 1350, the second products are summed to generate a second sum. For example, the second products may be summed by the second summer 256.
At block 1360, the first sum and the second sum are averaged to obtain an average of the first sum and the second sum. For example, the first sum and the second sum may be averaged by the switching circuit 280.
At block 1370, the average of the first sum and the second sum is converted into a digital signal. For example, the average of the first sum and the second sum may be converted into the digital signal by the ADC 270.
At block 1380, the digital signal is shifted and a one is added to the digital signal. For example, the shift and add circuit 330 may shift and add the one to the digital signal.
In certain aspects, the method 1300 further includes determining a number of ones in the first products and a number of ones in the second products have different parities. In these aspects, changing the bit value of the one of the second products includes changing the bit value of the one of the second products after a determination the number of ones in the first products and the number of ones in the second products have the different parities. The determination may be made by the parity compare circuit 310.
In certain aspects, shifting and adding the one to the digital signal includes shifting the digital by one bit position to multiply the digital signal by two, and outputting the one for a least significant bit (LSB) of the shifted digital signal.
In certain aspects, averaging the first sum and the second sum includes coupling the first summer 250 to the second summer 256 to obtain the average of the first sum and the second sum. For example, the first summer 250 may be coupled to the second summer 256 by the switching circuit 280.
Implementation examples are described in the following numbered clauses:
Within the present disclosure, the word “exemplary” is used to mean “serving as an example, instance, or illustration.” Any implementation or aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects of the disclosure. Likewise, the term “aspects” does not require that all aspects of the disclosure include the discussed feature, advantage or mode of operation. The term “coupled” is used herein to refer to the direct or indirect electrical coupling between two structures.
Any reference to an element herein using a designation such as “first,” “second,” and so forth does not generally limit the quantity or order of those elements. Rather, these designations are used herein as a convenient way of distinguishing between two or more elements or instances of an element. Thus, a reference to first and second elements does not mean that only two elements can be employed, or that the first element must precede the second element.
The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.