Compute-in-memory (CIM) technology allows for faster processing of data loaded in main memory or cache than data in storage memory by reducing the latency caused by retrieving data from the storage memory for processing operations. Processing the data using CIM hardware located at the main memory or the cache allows for faster processing compared to processing data near or further from the main memory or the cache by communication caused latency between the memory main memory or the cache and the near or further processing hardware.
Digital CIM is processed in a bit-serial fashion. For example, a multiply-accumulate operation may be composed of a NOR gate for bit multiplication followed by an adder tree for accumulation. However, a bit-serial operation may be time consuming as a number of cycles that may be required for a computation is a function of a number of input bits. For example, the number of cycles required for a bit-serial operation may be equal to the number of input bits.
Typical Booth multipliers may operate in parallel with multiple stages required to produce the final product. To calculate a final product, a typical Booth multiplier may require all partial sums be generated in sequence prior to a shift and an addition operation may be applied to produce the final product. Therefore, there are multiple obstacles to implementing Booth multiplication in CIM.
Aspects of the present disclosure are best understood from the following detailed description when read with the accompanying figures. It is noted that, in accordance with the standard practice in the industry, various features are not drawn to scale. In fact, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion.
The following disclosure provides many different embodiments, or examples, for implementing different features of the provided subject matter. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. For example, the formation of a first element, component, and/or feature over or on a second element, component, and/or feature in the description that follows may include embodiments in which the first and second elements, components, and/or feature are formed in direct contact, and may also include embodiments in which additional elements, components, and/or feature are formed between the first and second features, such that the first and second elements, components, and/or feature are not be in direct contact. In addition, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
Further, spatially relative terms, such as “beneath,” “below,” “lower,” “above,” “upper” and the like, may be used herein for ease of description to describe one element's, components', and/or or feature's relationship to another element(s), component(s), and/or feature(s) as illustrated in the Figures. The spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the Figures. The apparatus and/or device may be otherwise oriented (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein may likewise be interpreted accordingly. Unless explicitly stated otherwise, each element, component, and/or feature having the same reference numeral refer to the same element, component, and/or feature, and is to have the same material composition and to have a thickness within a same thickness range.
The terms “processor,” “processor core,” “controller,” and “control unit” are used interchangeably herein, unless otherwise noted, to refer to any one or all of a software-configured processor, a hardware-configured processor, a general purpose processor, a dedicated purpose processor, a single-core processor, a homogeneous multi-core processor, a heterogeneous multi-core processor, a core of a multi-core processor, a microprocessor, a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), etc., a controller, a microcontroller, a field programmable gate array (FPGA), an application-specific integrated circuit (ASIC), other programmable logic devices, discrete gate logic, transistor logic, and the like. A processor may be an integrated circuit, which may be configured such that the components of the integrated circuit reside on a single piece of semiconductor material, such as silicon.
The term “memory” is used herein, unless otherwise noted, to refer to any one or all of cache, main memory, random-access memory (RAM), including any variations of dynamic RAM (DRAM), synchronous DRAM (SDRAM), static RAM (SRAM), ferroelectric RAM (FeRAM), resistive RAM (RRAM), magnetoresistive RAM (MRAM), phase-change RAM (PCRAM), etc., flash memory, solid-state memory, and the like.
Digital CIM is processed in a bit-serial fashion. For example, a multiply-accumulate operation may be composed of a NOR gate for bit multiplication followed by an adder tree for accumulation. However, a bit-serial operation may be time consuming as a number of cycles that may be required for a computation is a function of a number of input bits. For example, the number of cycles required for a bit-serial operation may be equal to the number of input bits.
Typical Booth multipliers may operate in parallel with multiple stages required to produce a final product. Booth multipliers operate on the principles of Booth's algorithm that multiplies two signed binary numbers in 2's complement notation. As is typical in binary multiplication, Booth's algorithm generates partial products of the multiplication of a multiplicand by a multiplier that are shifted and summed to produce a final product. Booth's algorithm uses rules based on values of groups of bits of the multiplier to determine operations for generating the partial products using the multiplicand. The operation based on each group of bits may be implemented serially in by a typical Booth multiplier by inputting bits of the multiplicand and multiplier in to NOR gates and outputting the result to adders that generate partial sums. To calculate a final product, the typical Booth multiplier may require all partial sums be generated in sequence prior to a shift and an addition operation may be applied to produce the final product. This may significantly delay the processing of data and decrease computing speed. Therefore, there are multiple obstacles to implementing Booth multiplication in CIM.
Various embodiments described herein overcome the foregoing obstacles and enable improvements in computing speed and cost over typical Booth multiplier implementations. Various embodiments described herein include devices and methods for implementing a Booth multiplier for CIM. Various embodiments may include a Booth multiplier in CIM configured to implement Booth encoding and multi-cycle partial product generation enabling a reduction in hardware complexity and chip area as compared to typical Booth multiplier implementations.
The Booth multiplier may include a Booth encoder configured to implement Booth encoding. Various embodiments may be disclosed herein in relation to an example of 3-bit Booth encoding for 4-bit multiplication for clarity and ease of explanation. However, such descriptions are not intended to limit the scope of the claims and the enabling disclosures. One of skill in the art would realize that the disclosures herein may be similarly applied to Booth encoding of greater bit size or lesser bit size. Implementation of Booth encoding as a multiplication mode for digital CIM may replace multiplication of input data and weight data with multiplication of values derived from the input data (e.g., 0, 1, −1, 2, −2) and the weight data, where the values are indicated by a Booth encoded signal generated by encoding (e.g., 3-bit encoding) of an input sequence of the input data. A multiplexer/shifter may be implemented in CIM and configured to compute partial sums of the multiplication of multiple Booth encoded signals and the weight data. The Booth multiplier in CIM may enable a serial mode of Booth multiplication with the partial product generation, using the partial sums, and summation of the partial products over several cycles, compared to generating all partial products of the Booth multiplication prior to producing the final product as with typical Booth multiplier implementations.
As compared to typical Booth multiplier implementations, various embodiments of the Booth multiplier in CIM described herein may enable a reduction of a number of cycles required for computation. For example, where typical Booth multiplier implementations may require p cycles to execute a multiplication (where “p” is a number of input bits), various embodiments of Booth multiplier in CIM disclosed herein may execute a multiplication in p/2 cycles for signed inputs and p/2+1 cycles for unsigned computation. Other advantages of various embodiments disclosed herein over typical Booth multiplier implementations may include the ability to increase of trillions (or tera) operations per second (TOPS) per area. For example, the Booth multiplier in CIM may increase TOPS/mm2 by approximately 10% for unsigned 4-bit input and approximately 60% for signed computation compared to N5 Digital implementation (i.e., based on a typical bit-serial operation with a NOR gate used for bit by bit multiplication followed by an adder tree starting with a 5-bit adder as the computation is based on using a 4-bit weight). Various embodiments of a Booth multiplier in CIM disclosed herein may reduce overall hardware complexity and may increase area efficiency in CIM as compared to typical Booth multiplier implementations.
As illustrated in
Illustrated in
The Booth encoder 206 may generate Booth encoded signals 208, from the subsets 202, 204 of the input data 200 that may represent designated values configured to control CIM hardware 112a-112n to implement associated operations for executing the Booth multiplication in the CIM hardware 112a-112n. As described further herein, the Booth encoder 206 may be a circuit of logic components (e.g., Booth encoder 300 in
For example, from a subset 202, 204 of bits “111” and/or “000”, the Booth encoder 206 may generate a Booth encoded signal 208 that may represent a “0” value for multiplication with weight data (“W”), such as by indicating a logic gating operation in the CIM hardware 112a-112n to achieve the result of the multiplication. Logic gating in the CIM hardware 112a-112n may prevent bits of the weight data from propagating in the CIM hardware 112a-112n resulting in a “low” or “0” signal in place of the weight data, effectively multiplying the weight data by a “0” value.
From a subset 202, 204 of bits “001” and/or “010”, the Booth encoder 206 may generate a Booth encoded signal 208 that may represent a “1” value for multiplication with weight data, such as by indicating direct mapping of the weight data operation in the CIM hardware 112a-112n to achieve the result of the multiplication. Direct mapping in the CIM hardware 112a-112n may enable bits of the weight data to propagate in the CIM hardware 112a-112n unchanged resulting in signals representative of the unchanged weight data, effectively multiplying the weight data by a “1” value.
From a subset 202, 204 of bits “011”, the Booth encoder 206 may generate a Booth encoded signal 208 that may represent a “2” value for multiplication with weight data, such as by indicating a direct mapping of the weight data operation and left shift operation (e.g., left shift by 1 bit in an adder) on the weight data in the CIM hardware 112a-112n to achieve the result of the multiplication. Left shifting direct mapped weight data in the CIM hardware 112a-112n may shift bits of the weight data by an amount that changes the bits of the weight data resulting in signals representative of the weight data multiplied by a “2” value.
From a subset 202, 204 of bits “100”, the Booth encoder 206 may generate a Booth encoded signal 208 that may represent a “−2” value for multiplication with weight data, such as by indicating an inversion of the weight data operation, an addition operation of a “1” value at a least significant bit of the inverted weight data, and left shift operation (e.g., left shift by 1 bit in an adder) on the sum in the CIM hardware 112a-112n to achieve the result of the multiplication. Inverting bits of the weight data and addition of a “1” value at a least significant bit of the inverted bits of the weight data in the CIM hardware 112a-112n may generate signals representative of a negative signed version of the weight data, effectively multiplying the weight data by a “−1” value. Left shifting the negative signed version of the weight data in the CIM hardware 112a-112n may shift bits of the negative signed version of the weight data by an amount that changes the bits of the negative signed version of the weight data resulting in signals representative of the negative signed version of the weight data multiplied by a “2” value. Together, these operations may result in signals representative of the weight data multiplied by a “−2” value.
From a subset 202, 204 of bits “101” and/or “110”, the Booth encoder 206 may generate a Booth encoded signal 208 that may represent a “−1” value for multiplication with weight data, such as by indicating an inversion of the weight data operation and an addition operation of a “1” value at a least significant bit of the inverted weight data in the CIM hardware 112a-112n to achieve the result of the multiplication. Inverting bits of the weight data and addition of a “1” value at a least significant bit of the inverted bits of the weight data in the CIM hardware 112a-112n may generate signals representative of a negative signed version of the weight data, effectively multiplying the weight data by a “−1” value.
Compared to bit by bit multiplication, 3-bit Booth encoding for 4-bit multiplication may reduce processing time for a multiplication by approximately half. Rather than 4 cycles to multiply each bit of the input data 200 by a weight data as in bit by bit multiplication, the 3-bit Booth encoding may encode the input data 200 in 2 cycles, using two 3-bit subsets 202, 204, to generate the Booth encoded signals 208 configured to control the CIM hardware 112a-112n to achieve the result of the multiplication.
Illustrated in
A first NOR gate 304 may be coupled to an output end of the XOR gate 302 and an output end of the XNOR gate 308 to receive as inputs to the first NOR gate 304. Thus, the first NOR gate 304 may receive the first intermediary signal 1x from the XOR gate 302 and the second intermediary signal 2x from the XNOR gate 308 as inputs. The first NOR gate 304 may generate an output as a Booth encoded bit (“BE”).
A second NOR gate 306 may be coupled to the output end of the XOR gate 302 to receive the first intermediary signal 1x as an input as well as an output end of the first NOR gate 304 to receive the Booth encoded bit BE as inputs to the second NOR gate 306. Thus, the second NOR gate 306 may receive the first intermediary signal 1x from the XOR gate 302 and the Booth encoded bit BE from the first NOR gate 304 as inputs. The second NOR gate 306 may generate an output as an enable bit (“ENB”).
A third NOR gate 310 may be coupled to an output end of the second NOR gate 306 at an input end of the third NOR gate 310 to receive the ENB as an input. The third NOR gate 310 may also be coupled to the third bit line at an inverted input end to receive the inverse of the third bit line as an input. For example, an inverted may be coupled between the third bit line and the input end of the third NOR gate 310. Thus, the third NOR gate 310 may receive the enable bit ENB from the second NOR gate 306 and the third signal representing an inverse of the third bit of the subset 202, 204 from the third bit line as inputs. In some embodiments the third NOR gate 310 may invert the third signal. In some embodiment, the third NOR gate 310 may receive an inverted third signal from the inverter. The third NOR gate 310 may generate an output as a select bit (“S”).
The Booth encoder 300 may generate and output a Booth encoded signal 208 from a subset 202, 204 of the input data 200. A Booth encoded signal 208 may be any combination of binary bits. For example, the Booth encoded signal 208 may be 3-bit Booth encoded signals 208. The Booth encoded signal 208 may include the enable bit, the Booth encoded bit, and the select bit.
Illustrated in
In the example illustrated in
The Booth encoder 206, 300 receiving the subset 202, 204 of bits “001” and/or “010” may generate and output the Booth encoded signal 208 of bits “000”, which may be configured to cause other parts of the CIM hardware 112a-112n to execute multiplication of a “1” value with weight data, such as by a direct mapping of the weight data operation in the CIM hardware 112a-112n to achieve the result of the multiplication. The CIM hardware 112a-112n may be configured to interpret/be controlled by the Booth encoded signal 208 of bits “000” to perform direct mapping of the weight data. Direct mapping in the CIM hardware 112a-112n may enable bits of the weight data to propagate in the CIM hardware 112a-112n unchanged resulting in signals representative of the unchanged weight data, effectively multiplying the weight data by a “1” value.
The Booth encoder 206, 300 receiving the subset 202, 204 of bits “011” may generate and output the Booth encoded signal 208 of bits “010”, which may be configured to cause other parts of the CIM hardware 112a-112n to execute multiplication of a “2” value with weight data, such as by a direct mapping of the weight data operation and left shift operation (e.g., left shift by 1 bit in an adder) on the weight data in the CIM hardware 112a-112n to achieve the result of the multiplication. The CIM hardware 112a-112n may be configured to interpret/be controlled by the Booth encoded signal 208 of bits “010” to perform direct mapping and shifting of the weight data. Left shifting direct mapped weight data in the CIM hardware 112a-112n may shift bits of the weight data by an amount that changes the bits of the weight data resulting in signals representative of the weight data multiplied by a “2” value.
The Booth encoder 206, 300 receiving the subset 202, 204 of bits “100” may generate and output the Booth encoded signal 208 of bits “011”, which may be configured to cause other parts of the CIM hardware 112a-112n to execute multiplication of a “−2” value with weight data, such as by an inversion of the weight data operation, an addition operation of a “1” value at a least significant bit of the inverted weight data, and left shift operation (e.g., left shift by 1 bit in an adder) on the sum in the CIM hardware 112a-112n to achieve the result of the multiplication. The CIM hardware 112a-112n may be configured to interpret/be controlled by the Booth encoded signal 208 of bits “011” to perform inversion of the weight data, addition to the weight data, and shifting of the weight data. Inverting bits of the weight data and addition of a “1” value at a least significant bit of the inverted bits of the weight data in the CIM hardware 112a-112n may generate signals representative of a negative signed version of the weight data, effectively multiplying the weight data by a “−1” value. Left shifting the negative signed version of the weight data in the CIM hardware 112a-112n may shift bits of the negative signed version of the weight data by an amount that changes the bits of the negative signed version of the weight data resulting in signals representative of the negative signed version of the weight data multiplied by a “2” value. Together, these operations may result in signals representative of the weight data multiplied by a “−2” value.
The Booth encoder 206, 300 receiving the subset 202, 204 of bits “101” and/or “110” may generate and output the Booth encoded signal 208 of bits “001”, which may be configured to cause other parts of the CIM hardware 112a-112n to execute multiplication of a “−1” value with weight data, such as by an inversion of the weight data operation and an addition operation of a “1” value at a least significant bit of the inverted weight data in the CIM hardware 112a-112n to achieve the result of the multiplication. The CIM hardware 112a-112n may be configured to interpret/be controlled by the Booth encoded signal 208 of bits “001” to perform inversion of the weight data and addition to the weight data. Inverting bits of the weight data and addition of a “1” value at a least significant bit of the inverted bits of the weight data in the CIM hardware 112a-112n may generate signals representative of a negative signed version of the weight data, effectively multiplying the weight data by a “−1” value.
Illustrated in
Each register 502a, 502b, 502c, 502d may be coupled to a multiplexer 504a, 504b, 504c, 504d. In some embodiments, the registers 502a, 502b, 502c, 502d may include multiple outputs, such as a non-inverted output (or output) and an inverted output. Each register 502a, 502b, 502c, 502d may be coupled to one or more inputs of a multiplexer 504a, 504b, 504c, 504d via one or more of the output and the inverted output. In some embodiments, an inverter may be coupled between an output of a register 502a, 502b, 502c, 502d and an input of a multiplexer 504a, 504b, 504c, 504d to produce the inverted output. Each register 502a, 502b, 502c, 502d may receive a weight data (“W”) and output the weight data and/or an inverse of the weight data to the inputs of a multiplexer 504a, 504b, 504c, 504d. In some embodiments, the weight data may be one or more bits of weight data, such as 4-bit weight data. While
Each multiplexer 504a, 504b, 504c, 504d may be coupled at a select line to a select signal (e.g., select bit “S”) that may be outputted by one of multiple Booth encoders 206, 300. In some embodiments, each subset 202, 204 of the input data 200 may be input to one of the multiple Booth encoders 206, 300, and each of the multiple Booth encoders 206, 300 may output a select signal (e.g., S[i], S[i+1], S[i+2], S[i+3], where “i” may be a number of a cycle iteration) generated using the input subset 202, 204 of the input data 200. In some embodiments, each multiplexer 504a, 504b, 504c, 504d may be configured to receive a select signal for a different subset 202, 204 of the input data 200. For example, the select signal may be configured to cause the multiplexer 504a, 504b, 504c, 504d to select which one of the inputs of each respective multiplexer 504a, 504b, 504c, 504d (i.e., the weight data or the inverse of the weight data) to output to an adder 506a, 506b from an output of the multiplexer 504a, 504b, 504c, 504d. In some embodiments, the multiplexer 504a, 504b, 504c, 504d may directly map the weight data to the adder 506a, 506b. For example, the multiplexer 504a, 504b, 504c, 504d may directly map the weight data to the adder 506a, 506b in response to the select signal being a “0” value. In some embodiments, the multiplexer 504a, 504b, 504c, 504d may provide the inverse of the weight data to the adder 506a, 506b. For example, the multiplexer 504a, 504b, 504c, 504d may provide the inverse of the weight data to the adder 506a, 506b in response to the select signal being a “1” value.
The adders 506a, 506b may be of any bit size, such as 6-bit adders. Each adder 506a, 506b may be coupled to one or more multiplexers 504a, 504b, 504c, 504d, such as 2 multiplexers 504a, 504b, 504c, 504d, at an input. The adder 506a, 506b may receive the output of the multiplexers 504a, 504b, 504c, 504d at the input. Each adder 506a, 506b may also be coupled at a control line to receive the enable bit (e.g., enable bit “ENB”) output from one of the multiple Booth encoders 206, 300. In some embodiments, each of the multiple Booth encoders 206, 300 may output an enable bit (e.g., ENB[i], ENB [i+1], ENB [i+2], ENB [i+3], where “i” may be a number of a cycle iteration) generated using the input subset 202, 204 of the input data 200. In some embodiments, each adder 506a, 506b may be configured to receive one or more enable bits for different subsets 202, 204 of the input data 200. For example, each adder 506a, 506b may be configured to receive two enable bits (ENB). An ENB bit received by an adder 506a, 506b may be trigger the adder 506a, 506b to execute the add functions. For example, the enable encoded bit may be configured to cause the adder 506a, 506b to execute a gating operation on the output of the multiplexers 504a, 504b, 504c, 504d received by the adder 506a, 506b. For example, the adder 506a, 506b may execute a gating operation on the output of the multiplexers 504a, 504b, 504c, 504d received by the adder 506a, 506b in response to the enable bit a “1” value. The gating operation may set the inputs to the adder 506a, 506b to a value of “0” regardless of the value of the output of the multiplexers 504a, 504b, 504c, 504d.
Each adder 506a, 506b may also be coupled at a control line to receive the Booth encoded bit (e.g., Booth encoded bit “BE”) output from one of the multiple Booth encoders 206, 300. In some embodiments, each of the multiple Booth encoders 206, 300 may output a Booth encoded bit (e.g., BE[i], BE[i+1], BE[i+2], BE[i+3], where “i” may be a number of a cycle iteration) generated using the input subset 202, 204 of the input data 200. In some embodiments, each adder 506a, 506b may be configured to receive one or more Booth encoded bits for different subsets 202, 204 of the input data 200. For example, each adder 506a, 506b may be configured to receive two Booth encoded bits (BE). A BE bit received by an adder 506a, 506b may be trigger the adder 506a, 506b to execute the add functions. For example, the Booth encoded bit may be configured to cause the adder 506a, 506b to execute a left shift operation (e.g., left shift by 1 bit) on the weight data received by the adder 506a, 506b. For example, the adder 506a, 506b may execute a left shift operation on the weight data received by the adder 506a, 506b in response to the Booth encoded bit being a “1” value. The shift may be used to implement a multiplication of the weight data by a value of “2”.
Each adder 506a, 506b may be configured to receive one or more of the select signals for the different subsets 202, 204 of the input data 200 at a select line. For example, each adder 506a, 506b may be configured to receive two select signals (S). A select signal received by an adder 506a, 506b may be used by the adder 506a, 506b as a carry in (CIN) value for use in an addition with a least significant bit of a value at the adder 506a, 506b.
The adders 506a, 506b may output the results of their operations as inputs to an adder 508. The adder 508 may sum the results received at the inputs and generate a partial sum (PSUM0) of the Booth multiplication of the subsets 202, 204 of the input data 200 and the weight data.
Typical implementations of Booth multiplication use different construction from the described embodiments. In particular, typical implementations of Booth multiplication typically utilize NOR gates in place of each of the multiplexers 504a, 504b, 504c, 504d. Various embodiments described herein utilize the multiplexers 504a, 504b, 504c, 504d, which may enable an approximately 50% reduction in delay with executing at least two cycles for signed computation in comparison to typical implementations utilizing NOR gates. The delay reduction may be achieved by using Booth encoding to convert the input data for use in reducing the number of operations for achieving the multiplication. Multiple bits of the input data may be Booth encoded, and the resulting encoded bits may be used to execute calculations for the multiple bits, rather than bit-by-bit calculations executed by typical implementations.
The multiplexer 504a may be coupled, at an input, to any number of input lines configured to carry weight data. For example, the multiplexer 504a may be coupled to four input lines configured to carry weight data (e.g., W3, W2, W1, W0). The multiplexer 504a may include multiple inverters 600a, 600b, which may be configured to function as buffers for temporary storage of the weight data. For example, one inverter 600a, 600b may be configured to temporarily store the weight data, and another inverter 600a, 600b may be configured to temporarily store the inverse of the weight data.
The multiplexer 504a may be coupled, at a select line, to a select signal (e.g., select bit “S”) output by the Booth encoder 206, 300. The multiplexer 504a may include multiple transmission gates 602a coupled between the inverters 600a, 600b and outputs of the multiplexer 504a. The transmission gates 602a may also be coupled, at an input, to the select signal. The select signal may determine which of the input signal or the inverse of the input signal of each of the input weight data (e.g., W3, W2, W1, W0) to output from the multiplexer 504a. In some embodiments, pairs of the transmission gates 602a, coupled to the same output of the multiplexer 504a may be differently configured to respond to the select signal. For example, a transmission gate 602a may enable transmission of the weight data and/or inverse of the weight data stored at the inverter 600a and another transmission gate 602a may prevent transmission of the weight data and/or inverse of the weight data stored at the inverter 600b for the same select signal, and vice versa. The multiplexer 504a may output weight data and/or inverse of the weight data at an output as controlled by the select signal.
The adder 506a may receive, at an input, the weight data and/or inverse of the weight data (collectively referred to herein as weight data for the adder 506a) output by the multiplexer 504a. The adder 506a may be coupled to an enable signal (e.g., enable bit “ENB”) that may be outputted from the Booth encoder 206, 300. The enable signal may trigger the adder 506a to add the signal received at the inputs to a value held in an adder component 606 (i.e., shift register). The adder 506a may include multiple NOR gates 604a, 604b, 604c configured to receive the weight data at one input and the enable signal at a second input of the NOR gates 604a, 604b, 604c. The NOR gates 604a, 604b, 604c may be configured to NOR the weight data and the enable signal such that the enable signal may control a logic gating operation of the adder 506a. For example, an enable signal configured to enable logic gating (e.g., enable signal is a “1” value), the NOR gates 604a, 604b, 604c may only output “0” values regardless of the value of the weight data. Otherwise, the NOR gates 604a, 604b, 604c may output the weight data at the input and the enable signal configured not to enable logic gating (e.g., enable signal is a “0” value).
A control of the adder 506a may be coupled to a Booth encoded bit (e.g., Booth encoded bit “BE”) that is output by the Booth encoder 206, 300. The Booth encoded bit may be configured to control whether the adder 506a executes a shift left operation (e.g., shift left 1 bit). The output of each NOR gate 604a, 604b, 604c may be coupled to a shifter 608. The shifter 608 may include multiple transmission gates 602b configured to couple the output of each NOR gate 604b to multiple inverters 600e. In addition, shifter 608 may be configured to directly couple an inverter 600c to the output of the NOR gate 604a and may include a transmission gate 602b configured to couple the output of the NOR gate 604a to an inverter 600e. The NOR gate 604a may be associated with an input of a most significant bit of the weight data. The inverter 600e coupled to the NOR gate 604a may correspond with a most significant bit position of the weight data, and the inverter 600c coupled to the NOR gate 604a may correspond with a more significant bit position that the most significant bit position of the weight data. The shifter 608 may include a transmission gate 602b configured to couple the output of the NOR gate 604c to an inverter 600e and a transmission gate 602b configured to couple the output of the NOR gate 604c to an inverter 600e. The NOR gate 604c may be associated with an input of a least significant bit of the weight data. The inverter 600d coupled to the NOR gate 604c may correspond with a least significant bit position of the weight data. The adder 506a may also be coupled to a supply voltage (VDD). The shifter 608 may include a transmission gate 602c configured to couple the supply voltage VDD to the inverter 600d.
The transmission gates 602b and 602c may also be coupled to the Booth encoded (BE) bit. The transmission gates 602b may be configured to enable and/or prevent transmission of the output from the NOR gates 604a, 604b, 604c to the inverters 600e, 600d. The transmission gate 602c may be configured to enable and/or prevent transmission of the supply voltage to the inverter 600d. In some embodiments, pairs of the transmission gates 602b, 602c, coupled to the same inverters 600e, 600d may be differently configured to respond to the Booth encoded bit. For example, a transmission gate 602b may enable transmission of the output from the NOR gate 604a, 604b, 604c to the inverters 600e, 600d associated with the same bit position of the weight data, while another transmission gate 600e may prevent transmission of the output of the NOR gate 604b, 604c to the inverters 600e associated with the different bit positions of the weight data, and vice versa. The transmission gate 602c may enable transmission of the supply voltage to the inverter 600d and the transmission gates 602b may enable transmission of the output of the NOR gates 604b, 604c to the inverters 600e associated with the different bit position of the weight data in response to the same Booth encoded bit value. The different bit position of the weight data may be a more significant bit position associated with the inverters 600e than the bit position of the weight data associated with the NOR gate 604b, 604c. The inverter 600c may be associated with the different, more significant bit position of the weight data than the bit position of the weight data associated with the NOR gate 604a. Enabling transmission of the supply voltage to the inverter 600d by transmission gate 602b, transmission of the output of the NOR gates 604b, 604c to the inverters 600e associated with the different bit position of the weight data by the transmission gates 602b, 602c, and transmission of the output of the NOR gate 604a to the inverter 600c may enable a left shift of the weight data in the adder 506a. In some embodiments, the shifter 608 may include the NOR gate 604a, 604b, 604c. In some embodiments, the shifter 608 may include the inverters 600c, 600d, 600e.
An adder component 606 of the adder 506a may receive data temporarily stored at the inverters 600c, 600d, 600d. The adder component 606 may also receive, at an input (CIN), the select signal from the Booth encoder 300. The adder component 606 may be configured to sum the data received from the inverters 600c, 600d, 600e. In response to a designated value of the select signal (e.g., select signal is a “1” value) the adder component 606 may add a “1” value, as a CIN bit, to the least significant bit of the sum. The adder 506a and the adder component 606 may be configured to output the sum at an output. For example, the sum may be output to the adder 508 and used to generate the partial sum (PSUMO).
As described herein, the Booth encoder 704 may receive a multiplicand (e.g., input data 200 and/or a subset of input data 202, 204 of the input data). The Booth encoder 704 may be a circuit of logic components (e.g., Booth encoder 300 in
The compressor 708 may receive the partial products of the Booth algorithm hardware 702 and sum the partial products. The compressor may generate and output a sum of the partial products (sum) and/or a carry bit (carry). In some embodiments, the compressor 708 may be any type of compressor 708, such as a Wallace tree. The compressor 708 may sum partial products prior to the Booth algorithm hardware 702 generating and outputting all of the partial products for a Booth multiplication.
A carry-lookahead adder 710 may receive the partial products (sum) and/or a carry bit (carry) from the compressor 708. The carry-lookahead adder 710 summing the received partial products and/or carry bits may generate and output a final output of the Booth multiplication. The summed partial products received from the compressor 708 may be received as they become available. As with the compressor 708, the carry-lookahead adder 710 may receive the summed partial products prior to the Booth algorithm hardware 702 generating and outputting all of the partial products for the Booth multiplication. The carry-lookahead adder 710 may sum each of the received partial products with a sum of prior received partial products until all of the partial products are received, and output a final sum of the received partial products as the final output of the Booth multiplication.
The components of the Booth multiplier 700, including any of the Booth encoder 702, the Booth decoder 704, the compressor 706 and the carry-lookahead adder 708 may implement operations for Booth multiplication prior to receiving all of the data for Booth multiplication of the input data 200 and the weight data. The components of the Booth multiplier 700 may be configured to implement operations for Booth multiplication on, for example, a per cycle basis where each cycle Booth encodes a subset 202, 204 of the input data 200 and uses a Booth encoded signal 208 generated from the encoding. As such, components of the Booth multiplier 700 may be configured to implement operations for the Booth multiplication for each received subset 202, 204 of the input data 200. The Booth encoder 702 may only require the subset 202, 204 of the input data 200 relevant for the cycle being implemented. The Booth decoder 704 may manipulate weight data based on the Booth encoded signal 208 for the relevant cycle and produce partial products. The compressor 706 may sum the partial products of the relevant cycle to produce a sum of the partial products. The carry-lookahead adder 708 may sequentially sum the sum of the partial products output by the compressor 706 for sequential cycles to output the final sum of the received sums of partial products as the final output of the Booth multiplication.
In block 802, the CIM device may receive input data 200 at the Booth encoder 206, 300, 704. The input data 200 may be serial data, subsets 202, 204 of which may be received continually or periodically throughout the processes of implementing the method 800 until all of the input data 200 is received.
In block 804, the CIM device may Booth encode portions of the input data 200, received in block 802, in cycles. Subsets 202, 204 of the input data received at the Booth encoder 206, 300, 704 may be convert to Booth encoded signals 208 through various logic operations of various logic components, as illustrated in
Booth encoding the portions of the input data may convert the portions to Booth encoded signals 208 associated with a limited number of operations for executing the Booth multiplication in the CIM hardware 112a-112n, 500, 700. The Booth encoded signals 208 may be configured to control other parts of the CIM hardware 112a-112n, 500, 700, including the multiplexers 504a, 504b, 504c, 504d, the adders 506a, 506b, and/or the Booth decoder 706, configured for implementing a Booth multiplier, such as determining an operation for the Booth multiplier to execute and produce a partial sum. For example, the Booth encoder 206, 300, 704 receiving the subset 202, 204 of bits “000” and/or “111” may generate and output the Booth encoded signal 208 of bits “100”, which may be configured to cause other parts of the CIM hardware 112a-112n, 500, 700 to execute multiplication of a “0” value with weight data (“W”), such as by a logic gating operation in the CIM hardware 112a-112n, 500, 700 to achieve the result of the multiplication. The CIM hardware 112a-112n, 500, 700 may be configured to interpret/be controlled by the Booth encoded signal 208 of bits “100” to perform logic gating of the weight data. Logic gating in the CIM hardware 112a-112n, 500, 700 may prevent bits of the weight data from propagating in the CIM hardware 112a-112n, 500, 700 resulting in a “low” or “0” signal in place of the weight data, effectively multiplying the weight data by a “0” value.
The Booth encoder 206, 300, 704 receiving the subset 202, 204 of bits “001” and/or “010” may generate and output the Booth encoded signal 208 of bits “000”, which may be configured to cause other parts of the CIM hardware 112a-112n, 500, 700 to execute multiplication of a “1” value with weight data, such as by a direct mapping of the weight data operation in the CIM hardware 112a-112n, 500, 700 to achieve the result of the multiplication. The CIM hardware 112a-112n, 500, 700 may be configured to interpret/be controlled by the Booth encoded signal 208 of bits “000” to perform direct mapping of the weight data. Direct mapping in the CIM hardware 112a-112n, 500, 700 may enable bits of the weight data to propagate in the CIM hardware 112a-112n, 500, 700 unchanged resulting in signals representative of the unchanged weight data, effectively multiplying the weight data by a “1” value.
The Booth encoder 206, 300, 704 receiving the subset 202, 204 of bits “011” may generate and output the Booth encoded signal 208 of bits “010”, which may be configured to cause other parts of the CIM hardware 112a-112n, 500, 700 to execute multiplication of a “2” value with weight data, such as by a direct mapping of the weight data operation and left shift operation (e.g., left shift by 1 bit in an adder) on the weight data in the CIM hardware 112a-112n, 500, 700 to achieve the result of the multiplication. The CIM hardware 112a-112n, 500, 700 may be configured to interpret/be controlled by the Booth encoded signal 208 of bits “010” to perform direct mapping and shifting of the weight data. Left shifting direct mapped weight data in the CIM hardware 112a-112n, 500, 700 may shift bits of the weight data by an amount that changes the bits of the weight data resulting in signals representative of the weight data multiplied by a “2” value.
The Booth encoder 206, 300, 704 receiving the subset 202, 204 of bits “100” may generate and output the Booth encoded signal 208 of bits “011”, which may be configured to cause other parts of the CIM hardware 112a-112n, 500, 700 to execute multiplication of a “−2” value with weight data, such as by an inversion of the weight data operation, an addition operation of a “1” value at a least significant bit of the inverted weight data, and left shift operation (e.g., left shift by 1 bit in an adder) on the sum in the CIM hardware 112a-112n, 500, 700 to achieve the result of the multiplication. The CIM hardware 112a-112n, 500, 700 may be configured to interpret/be controlled by the Booth encoded signal 208 of bits “011” to perform inversion of the weight data, addition to the weight data, and shifting of the weight data. Inverting bits of the weight data and addition of a “1” value at a least significant bit of the inverted bits of the weight data in the CIM hardware 112a-112n, 500, 700 may generate signals representative of a negative signed version of the weight data, effectively multiplying the weight data by a “−1” value. Left shifting the negative signed version of the weight data in the CIM hardware 112a-112n, 500, 700 may shift bits of the negative signed version of the weight data by an amount that changes the bits of the negative signed version of the weight data resulting in signals representative of the negative signed version of the weight data multiplied by a “2” value. Together, these operations may result in signals representative of the weight data multiplied by a “−2” value.
The Booth encoder 206, 300, 704 receiving the subset 202, 204 of bits “101” and/or “110” may generate and output the Booth encoded signal 208 of bits “001”, which may be configured to cause other parts of the CIM hardware 112a-112n, 500, 700 to execute multiplication of a “−1” value with weight data, such as by an inversion of the weight data operation and an addition operation of a “1” value at a least significant bit of the inverted weight data in the CIM hardware 112a-112n, 500, 700 to achieve the result of the multiplication. The CIM hardware 112a-112n, 500, 700 may be configured to interpret/be controlled by the Booth encoded signal 208 of bits “001” to perform inversion of the weight data and addition to the weight data. Inverting bits of the weight data and addition of a “1” value at a least significant bit of the inverted bits of the weight data in the CIM hardware 112a-112n, 500, 700 may generate signals representative of a negative signed version of the weight data, effectively multiplying the weight data by a “−1” value.
In block 806, the CIM device may output a Booth encoded signal 208 from the Booth encoder 206, 300, 704. In block 808, the CIM device may receive the Booth encoded the signal 208 and weight data at the Booth decoder 706. Receiving the Booth encoded the signal 208 and weight data may include receiving at one or more of the multiplexers 504a, 504b, 504c, 504d and/or the adders 506a, 506b.
In block 810, the CIM device may generate a partial product of a multiplication of the input data 200 and the weight data and/or inverse of the weight data (collectively referred to herein as weight data for the method 800) using the Booth encoded signal 208 and the weight data. In other words, rather than a direct multiplication of the values of the input data 200, such as the subsets 202, 204 of the input data 200, and the weight data, the multiplication may be of a representative value (e.g., 0, 1, 2, −1, −2) controlled by the Booth encoded signal 208, for example, as described with reference to block 804, and the weight data. Various different operations, such as logic gating of the weight data, direct mapping of the weight data, inverting of the weight data, left shifting of the weight data, and/or adding a “1” value to the lest significant bit of the left shifted weight data, may be used to implement the multiplication of the representative value and the weight data. In some embodiments, the Booth decoder 706, including one or more of the multiplexers 504a, 504b, 504c, 504d and/or the adders 506a, 506b, 508 may generate the partial product.
In block 812, the CIM device may output the partial product from the Booth decoder 706 and receive the partial product at the compressor 708. In block 814, the CIM device may generate a partial sum by adding received partial products. The compressor 708 may accumulate partial products and add the partial products to generate the partial sum. In some embodiments, the addition of the partial products may generate a carry value.
In block 816, the CIM device may output the partial sum from the compressor 708. In some embodiments, the CIM device may output the carry value from the compressor 708 along with the associated partial sum. In block 818, the CIM device may receive the partial sum at an adder. In some embodiments, the adder may be the carry-lookahead adder 710. In some embodiments, the CIM device may receive the carry value output along with the associated partial sum.
In block 820, the CIM device may generate a final product of the Booth multiplication of the input data 200 and the weight data. The adder may accumulate partial sums and add the partial sums to generate the final product. In some embodiments, the adder may add the partial sums and the carry values to generate the final product. In block 822, the CIM device may output the final product. For example, the CIM device may output the final product from the CIM hardware 112a-112n, 500, 700, including the adder, to other CIM hardware 112a-112n, any part of the memory 100 (e.g., memory unit 102, memory chip 104a-104n, memory unit 108a-108n, banks 106a-106n, memory array 110a-110n), and/or to a processor (e.g., central processing unit (CPU); not shown).
In some embodiments, the process of Booth multiplication in CIM using CIM hardware 112a-112n, 500, including any of a Booth encoder 206, 300, 704, a Booth decoder 706, a multiplexer 504a, 504b, 504c, 504d, an adder 506a, 506b, 508, a compressor 708, a carry-lookahead adder 710, and/or components thereof may be described by the following example. Booth encoded multiplication of an input data 200 X3, X2, X1, X0 by a weight data W may be expressed as addition of partial products of subsets 202, 204 X1, X0, 0 and X3, X2, X1 of the input data 200 each multiplied by the weight data. In other words, (X3, X2, X1, X0)* W=((X1, X0, 0)*W)+((X3, X2, X1)*W). The Booth encoded multiplication may simplify the input data 220 by Booth encoding subsets 202, 204 of the input data generating Booth encoded signals 208, as in block 804, and interpreting the Booth encoded signals 208 as instructions for operations to manipulate weight data, as in block 810. For example, a multiplicand (or input data 200) of 0111 may be appended with a 0 so that the multiplicand is 01110, and divided into subsets 202, 204 of 110 and 011 based on 3-bit Booth encoding of the multiplicand using bits X2i+1, X2i, and X2i−1 per cycle, where “i” may be a number of a cycle iteration. As described herein, Booth encoding the subset 202, 204 of 110 may generate a Booth encoded signal configured to indicate multiplying the weight data by a “−1” value, such as by an inversion of the weight data operation and an addition operation of a “1” value at a least significant bit of the inverted weight data. Booth encoding the subset 202, 204011 may generate a Booth encoded signal configured to indicate multiplying the weight data by a “2” value, such as by a direct mapping of the weight data operation and left shift operation (e.g., left shift by 1 bit in an adder) on the weight data. To achieve Booth encoded multiplication using the Booth encoded signals 208 and implementing the instructions for operations to manipulate weight data, the input data 200 may be converted to a format of an addition of 2's compliment values. For example, a serial of “1”s in the multiplicand (or input data 200) may be expressed as 01110=10000−00010. This subtraction may be considered as addition with a 2's complement number as 01110=10000−00010=10000+00010*(−1) (the multiplication by “−1” gives the 2's complement number). A Booth encoded multiplication of the multiplicand 01110 and a multiplier (or weight data) AAA may then be preformed as 01110×AAA=(10000−00010)×AAA=10000*AAA+00010×(AAA+1) (for which direct mapped weight data may be represented by “AAA”, the inverse weight data may be represented by “AAA” and the 2's compliment of the weight data may be given by (AAA+1)). Each resulting multiplication may generate a partial product result of manipulating weight data, as in block 810, that may be summed to generate partial sum, as in block 814. As illustrated by this example, the Booth encoding enables multiple bit subsets 202, 204 of the input data 200 may be multiplied by the weight data, rather than typical Booth multiplication which multiplies individual bits of the input data by the weight data to generate partial products that are summed to generate a final output. The Booth encoded multiplication described herein reduces the number of partial products calculated for the Booth multiplication, enabling the execution of Booth multiplication using fewer cycles, less time, and less area of computing hardware as compared to typical Booth multiplication.
Various examples (including, but not limited to, the examples discussed above with reference to
The touchscreen controller 904 and the processor 902 may also be coupled to a touchscreen panel 912, such as a resistive-sensing touchscreen, capacitive-sensing touchscreen, infrared sensing touchscreen, etc. The wireless device 900 may have one or more radio signal transceivers 908 (e.g., Peanut®, Bluetooth®, Zigbee®, Wi-Fi, RF radio) and antennas 910, for sending and receiving, coupled to each other and/or to the processor 902. The transceivers 908 and antennas 910 may be used with the above-mentioned circuitry to implement the various wireless transmission protocol stacks and interfaces. The wireless device 900 may include a cellular network wireless modem chip 916 that enables communication via a cellular network and is coupled to the processor.
The wireless device 900 may include a peripheral device connection interface 918 coupled to the processor 902. The peripheral device connection interface 918 may be singularly configured to accept one type of connection, or multiply configured to accept various types of physical and communication connections, common or proprietary, such as USB, FireWire, Thunderbolt, or PCIe. The peripheral device connection interface 918 may also be coupled to a similarly configured peripheral device connection port (not shown). The wireless device 900 may also include speakers 914 for providing audio outputs. The wireless device 900 may also include a housing 920, constructed of a plastic, metal, or a combination of materials, for containing all or some of the components discussed herein. The wireless device 900 may include a power source 922 coupled to the processor 902, such as a disposable or rechargeable battery. The rechargeable battery may also be coupled to the peripheral device connection port to receive a charging current from a source external to the wireless device 900.
Various examples (including, but not limited to, the examples discussed above with reference to
Various examples (including, but not limited to, the examples discussed above with reference to
With reference to
Referring to
Referring to
Referring to
The foregoing method descriptions and the process flow diagrams are provided merely as illustrative examples and are not intended to require or imply that the steps of various examples must be performed in the order presented. As will be appreciated by one of skill in the art the order of steps in the foregoing examples may be performed in any order. Words such as “thereafter,” “then,” “next,” etc. are not intended to limit the order of the steps; these words are simply used to guide the reader through the description of the methods. Further, any reference to claim elements in the singular, for example, using the articles “a,” “an” or “the” is not to be construed as limiting the element to the singular.
The various illustrative logical blocks, processes, circuits, and algorithm steps described in connection with the examples disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, processes, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the various embodiments disclosed herein.
The preceding description of the disclosed examples is provided to enable any person skilled in the art to make or use the various embodiments disclosed herein. Various modifications to these examples will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other examples without departing from the spirit or scope of the invention. Thus, the various embodiments disclosed herein are not intended to be limited to the examples shown herein but is to be accorded the widest scope consistent with the following claims and the principles and novel features disclosed herein.
As described herein, one skilled in the art will realize that examples of dimensions are approximate values and may vary by +/−5.0%, as required by manufacturing, fabrication, and design tolerances.
Various embodiments and examples are described herein in terms of electric voltage or electric current. One skilled in the art will realize that such embodiments and examples may be similarly implemented in terms of the other of electric voltage or electric current.
The foregoing outlines features of several embodiments so that those skilled in the art may better understand the aspects of the present disclosure. Those skilled in the art should appreciate that they may readily use the present disclosure as a basis for designing or modifying other processes and structures for carrying out the same purposes and/or achieving the same advantages of the embodiments introduced herein. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the present disclosure, and that they may make various changes, substitutions, and alterations herein without departing from the spirit and scope of the present disclosure.