The present disclosure relates to computing hardware. More particularly, the present disclosure relates to digital binary multipliers.
A digital binary multiplier is a hardware component that is typically configured to perform multiplication operations. For instance, a digital binary multiplier may be configured to take two binary inputs that represent two values, perform bit multiplication on the two binary inputs, and generate a binary output that represents the product of the two values. Digital binary multipliers are utilized in many applications including central processing units (CPUs), mobile computing devices, hardware accelerators, etc.
Various embodiments of the present disclosure are illustrated by way of example and not limitation in the figures of the accompanying drawings.
In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of the present disclosure. Such examples and details are not to be construed as unduly limiting the elements of the claims or the claimed subject matter as a whole. It will be evident to one skilled in the art, based on the language of the different claims, that the claimed subject matter may include some or all of the features in these examples, alone or in combination, and may further include modifications and equivalents of the features and techniques described herein.
Described here are techniques for providing masking-based digital binary multipliers. In some embodiments, a digital binary multiplier is configured to receive two sets of input bits that represent two input values. The digital binary multiplier performs logical AND operations between each bit in the first input with each bit in the second input. The ANDed values are referred to as partial products. The digital binary multiplier is also configured to receive a control signal that indicates one of several modes of operation of the digital binary multiplier. Each mode of operation of the digital binary multiplier configures the digital binary multiplier to perform multiplication on inputs having a particular bit-length. Based on the control signal, the digital binary multiplier masks a portion of the partial products. Finally, the digital binary multiplier sums the partial product to generate a set of output bits that represents the product of the two input values.
The techniques described in the present application provide a number of benefits and advantages over conventional methods for providing digital binary multipliers. First, the masking approach can lower dynamic power consumption by reducing activity in unused multiplier circuits due to the fact that their inputs are masked instead of simply computing all potentially needed outputs and selecting the ones that are actually needed for a given mode. Second, such an approach can also make it easier to route designs on advanced process nodes by adding pins to partial product generators and summation circuits, which have relatively lower pin density than multiplexers. Third, providing a digital binary multiplier with different modes of operations in multiplication can be performed on inputs having different bit-lengths allows a single digital binary multiplier to flexibly perform higher precision multiplications at lower throughput and lower precision multiplications at higher throughput.
Input manager 170 is configured to manage the inputs received from the first set of input bits 101-116 and the second set of input bits 117-132. Upon receiving values for input bits 101-132, input manager 170 may send them to partial product manager 180 for further processing.
Mode manager 175 is responsible for managing the mode of operation of multiplier 100. For example, mode manager 175 can receive a control signal from the set of control bits 165-167. Based on the values of control bits 165-167, mode manager 175 determines a mode of operation for multiplier 100. Then, mode manager 175 sends the determined mode of operation to masking manager 185.
Partial product manager 180 handles the generation of partial products for multiplier 100. For instance, when partial product manager 180 receives values for the first set of input bits 101-116 and values for the second set of input bits 117-132, partial product manager 180 generates partial products based on those values. Specifically, partial product manager 180 performs a logical AND operation between each bit in the first set of input bits 101-116 with each bit in the second set of input bits 117-132. The result of each logical AND operation is a partial product.
Masking manager 185 is configured to mask certain partial products based on the mode of operation of multiplier 100. For example, when masking manager 185 receives the mode of operation from mode manager 175, masking manager 185 determines a subset of the partial products to mask. In some embodiments, masking manager 185 masks a partial product by setting the result of the logical AND operation to zero. In other embodiments, masking manager 185 masks a partial product by setting the inputs to the logical AND operation to zero.
Output manager 190 is responsible for generating the values for the set of output bits 133-164. For instance, output manager 190 can generate the value for a particular bit in the set of output bits 133-164 by summing the partial products associated with the particular bit, including the partial products that have been masked. The values that output manager 190 generates for the set of output bits 133-164 is the product between the values for the first set of input bits 101-116 and the values for the second set of input bits 117-132.
Several example modes of operation of multiplier 100 will now be described by reference to
For this example, the first twelve bits of input A (i.e., A0-A11) are multiplied with the first twelve bits of input B (i.e., B0-B11). The partial products generated by partial product manager 180 that are relevant to determine the product between the first twelve bits of input A and the first twelve bits of input B are the dots encompassed in the dashed box 205. To prevent irrelevant partial products from being incorrectly taken into account in the determination of the relevant output bits (i.e., output bits O0-O23 in this example), masking manager 185 masks the partial products represented by white dots when multiplier 100 is operating in this first mode of operation. Since output bits O24-O31 are not relevant in this mode of operation of multiplier 100, masking manager 185 does not mask the partial products in those columns.
Output manager 190 generates values for output bits O0-O23 by summing the partial products in each of the corresponding columns shown in
In this example, the first eight bits of input A (i.e., A0-A7) are multiplied with the first eight bits of input B (i.e., B0-B7). The partial products generated by partial product manager 180 that are relevant to determine the product between the first eight bits of input A and the first eight bits of input B are the dots encompassed in the dashed box 305. In addition, the last eight bits of input A (i.e., A8-A15) are multiplied with the last eight bits of input B (i.e., B7-B15). The partial products generated by partial product manager 180 that are relevant to determine the product between the last eight bits of input A and the last eight bits of input B are the dots encompassed in the dashed box 310. In order to prevent irrelevant partial products from erroneously influencing the determination of the relevant output bits (i.e., output bits O0-O31 in this example), masking manager 185 masks the partial products represented by white dots when multiplier 100 is operating in the second mode of operation.
Output manager 190 generates values for output bits O0-O31 by summing the partial products in each of the corresponding columns depicted in
Here, the first set of four bits of input A (i.e., A0-A3) is multiplied with the first set of four bits of input B (i.e., B0-B3). The partial products generated by partial product manager 180 that are relevant to determine the product between the first set of four bits of input A and the first set of four bits of input B are the dots encompassed in the dashed box 405. Also, the second set of four bits of input A (i.e., A4-A7) is multiplied with the second set of four bits of input B (i.e., B4-B7). The partial products generated by partial product manager 180 that are relevant to determine the product between the second set of four bits of input A and the second set of four bits of input B are the dots encompassed in the dashed box 410. The third set of four bits of input A (i.e., A8-A11) is multiplied with the third set of four bits of input B (i.e., B8-B11). The partial products generated by partial product manager 180 that are relevant to determine the product between the third set of four bits of input A and the third set of four bits of input B are the dots encompassed in the dashed box 415. Lastly, the fourth set of four bits of input A (i.e., A12-A15) is multiplied with the fourth set of four bits of input B (i.e., B12-B15). The partial products generated by partial product manager 180 that are relevant to determine the product between the fourth set of four bits of input A and the fourth set of four bits of input B are the dots encompassed in the dashed box 420. To prevent irrelevant partial products from being incorrectly taken into consideration in the determination of the relevant output bits (i.e., output bits O0-O31 in this example), masking manager 185 masks the partial products represented by white dots when multiplier 100 is operating in this third mode of operation.
Output manager 190 generates values for output bits O0-O31 by summing the partial products in each of the corresponding columns illustrated in
For this example, the first set of two bits of input A (i.e., A0 and A1) is multiplied with the first set of two bits of input B (i.e., B0 and B1). The partial products generated by partial product manager 180 that are relevant to determine the product between the first set of two bits of input A and the first set of two bits of input B are the dots encompassed in the dashed box 505. The second set of two bits of input A (i.e., A2 and A3) is multiplied with the second set of two bits of input B (i.e., B2 and B3). The partial products generated by partial product manager 180 that are relevant to determine the product between the second set of two bits of input A and the second set of two bits of input B are the dots encompassed in the dashed box 510. Next, the third set of two bits of input A (i.e., A4 and A5) is multiplied with the third set of two bits of input B (i.e., B4 and B5). The partial products generated by partial product manager 180 that are relevant to determine the product between the third set of two bits of input A and the third set of two bits of input B are the dots encompassed in the dashed box 515. The fourth set of two bits of input A (i.e., A6 and A7) is multiplied with the fourth set of two bits of input B (i.e., B6 and B7).
The partial products generated by partial product manager 180 that are relevant to determine the product between the fourth set of two bits of input A and the fourth set of two bits of input B are the dots encompassed in the dashed box 520. Then, the fifth set of two bits of input A (i.e., A8 and A9) is multiplied with the fifth set of two bits of input B (i.e., B8 and B9). The partial products generated by partial product manager 180 that are relevant to determine the product between the fifth set of two bits of input A and the fifth set of two bits of input B are the dots encompassed in the dashed box 525. The sixth set of two bits of input A (i.e., A10 and A11) is multiplied with the sixth set of two bits of input B (i.e., B10 and B11). The partial products generated by partial product manager 180 that are relevant to determine the product between the sixth set of two bits of input A and the sixth set of two bits of input B are the dots encompassed in the dashed box 530. The seventh set of two bits of input A (i.e., A12 and A13) is multiplied with the seventh set of two bits of input B (i.e., B12 and B13). The partial products generated by partial product manager 180 that are relevant to determine the product between the seventh set of two bits of input A and the seventh set of two bits of input B are the dots encompassed in the dashed box 535. Finally, the eighth set of two bits of input A (i.e., A14 and A15) is multiplied with the eighth set of two bits of input B (i.e., B14 and B15). The partial products generated by partial product manager 180 that are relevant to determine the product between the eighth set of two bits of input A and the eighth set of two bits of input B are the dots encompassed in the dashed box 540. In order to prevent irrelevant partial products from erroneously influencing the determination of the relevant output bits (i.e., output bits O0-O31 in this example), masking manager 185 mask the partial products represented by white dots when multiplier 100 is operating in the fourth mode of operation.
Output manager 190 generates values for output bits O0-O31 by summing the partial products in each of the corresponding columns shown in
The examples described above by reference to
Next, process 600 receives, at 620, a control signal indicating a mode of operation in a plurality of modes of operation. Referring to
Then, process 600 generates, at 630, a plurality of partial products based on the first plurality of input bits and the second plurality of input bits. Referring to
Based on the control signal, process 600 masks, at 640, a subset of the plurality of partial products. Referring to
Finally, process 600 generates, at 650, a plurality of output bits based on the plurality of partial products. Referring to
As shown, A1 accelerator 700 includes matrix multiplication units 705a-m. Each of the matrix multiplication units 705a-m is configured to perform multiplication operations on matrices. In some embodiments, dot product units configured to perform dot product operations can be used to implement A1 accelerator 700 instead of matrix multiplication units 705a-m. In other embodiments, A1 accelerator 700 includes such dot product units in addition to matrix multiplication units 705a-m.
As depicted in
In various embodiments, the present disclosure includes systems, methods, and apparatuses for providing masking-based digital binary multipliers. The techniques described herein may be embodied in non-transitory machine-readable medium storing a program executable by a computer system, the program comprising sets of instructions for performing the techniques described herein. In some embodiments, a system includes a set of processing units and a non-transitory machine-readable medium storing instructions that when executed by at least one processing unit in the set of processing units cause the at least one processing unit to perform the techniques described above. In some embodiments, the non-transitory machine-readable medium may be memory, for example, which may be coupled to one or more controllers or one or more artificial intelligence processors, for example.
The following techniques may be embodied alone or in different combinations and may further be embodied with other techniques described herein.
For example, in some embodiments, the techniques described herein relate to a method, executable by a digital binary multiplier, the method including: receiving a first plurality of input bits and a second plurality of input bits; receiving a control signal indicating a mode of operation in a plurality of modes of operation; generating a plurality of partial products based on the first plurality of input bits and the second plurality of input bits; based on the control signal, masking a subset of the plurality of partial products; and generating a plurality of output bits based on the plurality of partial products.
In some embodiments, the techniques described herein relate to a method, wherein masking the subset of the plurality of partial products includes setting each partial product in the subset of the plurality of partial products to zero.
In some embodiments, the techniques described herein relate to a method, wherein masking the subset of the plurality of partial products includes setting inputs for each partial product in the subset of the plurality of partial products to zero.
In some embodiments, the techniques described herein relate to a method, wherein generating the plurality of output bits is further based on the masked subset of the plurality partial products.
In some embodiments, the techniques described herein relate to a method, wherein generating each output bit in the plurality of output bits includes summing a different subset of the plurality of partial products.
In some embodiments, the techniques described herein relate to a method, wherein the control signal is a first control signal, wherein the mode of operation is a first mode of operation, wherein the plurality of partial products is a first plurality of partial products, wherein the plurality of output bits is a first plurality of output bits, the method further including: receiving a third plurality of input bits and a fourth plurality of input bits; receiving a second control signal indicating a second mode of operation in the plurality of modes of operation; generating a second plurality of partial products based on the third plurality of input bits and the fourth plurality of input bits; based on the second control signal, masking a subset of the second plurality of partial products; and generating a second plurality of output bits based on the second plurality of partial products.
In some embodiments, the techniques described herein relate to a method, wherein the first plurality of input bits includes a first set of input bits representing a first numerical value and a second set of input bits representing a second numerical value, wherein the second plurality of inputs bits includes a third set of input bits representing a third numerical value and a fourth set of input bits representing a fourth numerical value.
In some embodiments, the techniques described herein relate to a method, wherein the plurality of output bits includes a first set of output bits representing a fifth numerical value and a second set of output bits representing a sixth numerical value, wherein the fifth numerical value is the product of the first numerical value and the third numerical value, wherein the sixth numerical value is the product of the second numerical value and the fourth numerical value.
In some embodiments, the techniques described herein relate to a method, wherein generating the plurality of partial products includes performing logical AND operations between the first plurality of input bits and the second plurality of input bits.
In some embodiments, the techniques described herein relate to a method, wherein the digital binary multiplier is included in an artificial intelligence (AI) accelerator.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit including: a first circuit configured to receive a first plurality of input bits and a second plurality of input bits; a second circuit configured to receive a control signal indicating a mode of operation in a plurality of modes of operation; a third circuit configured to generate a plurality of partial products based on the first plurality of input bits and the second plurality of input bits; a fourth circuit configured to mask, based on the control signal, a subset of the plurality of partial products; and a fifth circuit configured to generate a plurality of output bits based on the plurality of partial products.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein masking the subset of the plurality of partial products includes setting each partial product in the subset of the plurality of partial products to zero.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein masking the subset of the plurality of partial products includes setting inputs for each partial product in the subset of the plurality of partial products to zero.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein generating the plurality of output bits is further based on the masked subset of the plurality partial products.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein generating each output bit in the plurality of output bits includes summing a different subset of the plurality of partial products.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein the control signal is a first control signal, wherein the mode of operation is a first mode of operation, wherein the plurality of partial products is a first plurality of partial products, wherein the plurality of output bits is a first plurality of output bits, wherein the first circuit is further configured to receive a third plurality of input bits and a fourth plurality of input bits, wherein the second circuit is further configured to receive a second control signal indicating a second mode of operation in the plurality of modes of operation, wherein the third circuit is further configured to generate a second plurality of partial products based on the third plurality of input bits and the fourth plurality of input bits, wherein the fourth circuit is further configured to mask, based on the second control signal, a subset of the second plurality of partial products, and wherein the fifth circuit is further configured to generate a second plurality of output bits based on the second plurality of partial products.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein the first plurality of input bits includes a first set of input bits representing a first numerical value and a second set of input bits representing a second numerical value, wherein the second plurality of inputs bits includes a third set of input bits representing a third numerical value and a fourth set of input bits representing a fourth numerical value.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein the plurality of output bits includes a first set of output bits representing a fifth numerical value and a second set of output bits representing a sixth numerical value, wherein the fifth numerical value is the product of the first numerical value and the third numerical value, wherein the sixth numerical value is the product of the second numerical value and the fourth numerical value.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein generating the plurality of partial products includes performing logical AND operations between the first plurality of input bits and the second plurality of input bits.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein the digital binary multiplier circuit is included in an artificial intelligence (AI) accelerator.
The above description illustrates various embodiments of the present disclosure along with examples of how aspects of the particular embodiments may be implemented. The above examples should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the particular embodiments as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents may be employed without departing from the scope of the present disclosure as defined by the claims.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein generating the plurality of partial products includes performing logical AND operations between the first plurality of input bits and the second plurality of input bits.
In some embodiments, the techniques described herein relate to a digital binary multiplier circuit, wherein the digital binary multiplier circuit is included in an artificial intelligence (AI) accelerator.
The above description illustrates various embodiments of the present disclosure along with examples of how aspects of the particular embodiments may be implemented. The above examples should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of the particular embodiments as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents may be employed without departing from the scope of the present disclosure as defined by the claims.