This disclosure relates generally to analog multiplication computation and particularly to analog computation of multiply-and-accumulate operations that may be performed in-memory.
This disclosure describes energy-efficient hardware to execute multiplication operations and particularly vector matrix multiplication (VMM) operations for digital data (e.g., values represented in logical bits or Boolean values). A VMM operation is a fundamental operation for many neural networks, and includes summing the results of a several multiplication operations. VMM operations may also be referred to as a “multiply and accumulate” (MAC) operation. In many neural networks, a convolution layer is defined by the multiplication of a set of activations with a respective set of weights, which are then summed to yield the output for a particular channel. For example, a set of activations A={A0−An} are multiplied by a respective set of weights W={W0−Wn} to yield an output O: O=(A0×W0)+(A1×W1)+. . . +(An×Wn). As such, efficiently calculating multiplications and the sum thereof in hardware may significantly improve neural network hardware efficiency and effectiveness.
Prior accelerators may typically require extensive data transfer or several clock cycles for processing these operations in the logical domain, while prior analog solutions may be difficult to successfully realize with sufficient accuracy. There is thus a need for an approach that improves energy consumption, hardware footprint, maintains accuracy, and may operate on logical (e.g., Boolean) inputs, such as those from a memory storage.
Embodiments will be readily understood by the following detailed description in conjunction with the accompanying drawings. To facilitate this description, like reference numerals designate like structural elements. Embodiments are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings.
The systems, methods, and devices of this disclosure each have several innovative aspects, no single one of which is solely responsible for all desirable attributes disclosed herein. Details of one or more implementations of the subject matter described in this specification are set forth in the description below and the accompanying drawings.
This disclosure below provides an improved method for performing multiplication operations and particularly for performing multiple such operations in parallel with subsequent summation to implement a VMM/MAC operation. Input operands for the multiplication may be input in the digital domain (e.g., as Boolean, digital bits), and processed by the circuit in the analog domain without a prior conversion by a digital-to-analog converter (DAC) and without the need to store or write intermediate values to memory. Switched capacitors and their coupling with SRAM provide one effective implementation for this circuit for compute-in-memory solutions.
A circuit to perform the multiplication may include a plurality of capacitors that correspond to digital bit values, such that the comparative charge of each capacitor for a given voltage corresponds to the relative value of the bit values. For example, the capacitance of the respective capacitors may be one, two, four, and eight for a four-bit logical value. The charging and discharging of the capacitors is controlled by a set of first switches to selectively charge and discharge the capacitors according to the respective multiplication operands. A second switch may also be included to connect the common interconnect to a charging voltage, ground, or an output.
To perform the multiplication, the capacitors may be charged according to a first operand (e.g., a weight value) by connecting the respective capacitors to a positive voltage or a ground by the first switch, and the second switch connected to the charging voltage. After charging, the capacitors may be connected to a common interconnect with the first switch, and the second switch disconnected from the charging voltage, such that the charge of the charged capacitors is shared among all of the capacitors, averaging the charge to a level based on the respective capacitance of the charged capacitors. This stores an averaged charge in the capacitors that reflects the value of the first operand. Next to “multiply” by the second operand (e.g., an activation value), the capacitors are either connected to ground or remain connected to the common interconnect with the first switches based on the second operand, removing the charge for the capacitors which have a logical zero in respective bits of the second operand. To output a voltage reflecting the multiplication, the capacitors are switched to the common interconnect by the first switches, and the charge is again averaged among the capacitors to yield a voltage level reflecting the multiplication. An analog-to-digital convertor may then interpret the voltage level and output a digital multiplication output.
To implement a VMM/MAC operation, a plurality of analog multiplication circuits may be implemented in parallel and charge and selectively discharge according to respective operands within a local common interconnect. To “accumulate” the results of the multiplication, the local common interconnects of each multiplication circuit are connected, permitting the capacitor charge to average across the capacitors disposed at each multiplication circuit and output a voltage reflecting the “sum” of the multiplications. This voltage is read by an analog-to-digital convertor and outputs a multiply-and-accumulate result.
In one embodiment, these circuits may be implemented within a memory array, permitting operand values to be read and operated on within the memory array. When implemented in a memory array, the MAC calculation may be highly parallelable and reduce the number of cycles required for a MAC operation.
As such, this approach for a MAC operation permits a direct feed-through of digital inputs, both weight and activation, and use of binary-weighted capacitors to provide multi-bit support while executing the MAC operation in the analog domain with minimized clock cycles. In addition, by direct adoption of digital inputs, a source of error from digital-to-analog conversion can be avoided.
For purposes of explanation, specific numbers, materials, and configurations are set forth in order to provide a thorough understanding of the illustrative implementations. However, it will be apparent to one skilled in the art that the present disclosure may be practiced without the specific details or/and that the present disclosure may be practiced with only some of the described aspects. In other instances, well known features are omitted or simplified in order not to obscure the illustrative implementations.
In the following detailed description, reference is made to the accompanying drawings that form a part hereof, and in which is shown, by way of illustration, embodiments that may be practiced. It is to be understood that other embodiments may be utilized, and structural or logical changes may be made without departing from the scope of the present disclosure. Therefore, the following detailed description is not to be taken in a limiting sense.
Various operations may be described as multiple discrete actions or operations in turn, in a manner that is most helpful in understanding the claimed subject matter. However, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations may not be performed in the order of presentation. Operations described may be performed in a different order from the described embodiment. Various additional operations may be performed, and/or described operations may be omitted in additional embodiments.
For the purposes of the present disclosure, the phrase “A and/or B” means (A), (B), or (A and B). For the purposes of the present disclosure, the phrase “A, B, and/or C” means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B, and C). The term “between,” when used with reference to measurement ranges, is inclusive of the ends of the measurement ranges. The meaning of “a,” “an,” and “the” include plural references. The meaning of “in” includes “in” and “on.”
The description uses the phrases “in an embodiment” or “in embodiments,” which may each refer to one or more of the same or different embodiments. Furthermore, the terms “comprising,” “including,” “having,” and the like, as used with respect to embodiments of the present disclosure, are synonymous. The disclosure may use perspective-based descriptions such as “above,” “below,” “top,” “bottom,” and “side”; such descriptions are used to facilitate the discussion and are not intended to restrict the application of disclosed embodiments. The accompanying drawings are not necessarily drawn to scale. The terms “substantially,” “close,” “approximately,” “near,” and “about,” generally refer to being within +/−20% of a target value. Unless otherwise specified, the use of the ordinal adjectives “first,” “second,” and “third,” etc., to describe a common object merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking or in any other manner.
In the following detailed description, various aspects of the illustrative implementations will be described using terms commonly employed by those skilled in the art to convey the substance of their work to others skilled in the art.
As shown in
Each capacitor 210A-D is connected to a respective first switch 220A-D, which may be individually controlled to connect the respective capacitors to a ground voltage 225 or a common interconnect 280 connected to an output Vmult 230. The first switches 220 may be controlled by a control circuit (not shown) that uses input operands and a sequence of steps/phases for controlling charging and discharging the capacitors 210. A second switch 240 may also be included for connecting the common interconnect to a ground voltage 250, a voltage source Vcc 260, or may be disconnected from both set voltages and left open 270. As such, the 4-bit analog multiplication circuit is composed with total 15 units of capacitors arranged in four binary weighted branches with one switch for each branch to discharge or connect to a common interconnect. The particular arrangement of switches, ground, and charging/positive voltage may also be varied in additional embodiments to provide similar functionality.
Generally,
Next, the capacitors 210 are charged based on the first operand by switching the first switches according to the bits of the first operand and connecting the common interconnect to a voltage source 260 with the second switch 240. This translates the first operand to a number of total charges stored in the capacitors 210 while the first operand remains represented in a digital form. In various embodiments, the first operand may represent a weight or an activation for a neural network convolutional layer. In the example of
In the next step shown in
To perform the multiplication, respective capacitors are maintained or discharged according to the second operand as shown in
As shown in
the charging voltage Vcc 260. To process the result, the ADC is configured to read Vmult according to the output range of possible the multiplication products for the input operands (e.g., to interpret voltage levels from 0-152). In one embodiment, the output for Vmult is scaled or otherwise mapped to an output range by the ADC. For example, the output may be the same value range as the input operands (e.g., when the input operands are represented in 4 bits, the output may be mapped to an output range of 4 bits by scaling or another transformation). In practice, many applications may operate effectively with such output scaling to a relatively small value range. For example, many neural networks may perform accurately with a relatively small value range (e.g., as represented by 2, 3, 4, 6, or 8 bits).
In this way, the analog multiplication circuit 100 may be used to effectively process digital inputs in the analog domain and output a digital multiplication result with effective use of the provisioned capacitors. The capacitors in this configuration are used to store a charge reflecting the value of the first operand, and then re-used to reflect the multiplication of that first operand with the second operand, permitting the provisioned capacitance (e.g., a number of unit capacitors) to match the logical value of the operands (e.g., 15 capacitors for a maximum operand value of 15 in a 4-bit representation).
To implement the circuit in memory, e.g., SRAM memory, the capacitors may be realized by traditional backend-of-line metal-finger capacitor (MFC), state-of-art embedded DRAM-like capacitor, or a frontend-of-line-based capacitor. In one embodiment, the capacitor array may be highly integrated with the frontend transistors.
The first and second switches may be implemented with appropriate control circuitry to optimize the steps discussed above, e.g., discharging, charging, charge sharing, etc., as determined by the current phase/clock cycle and the values of the first operand and second operand. In one embodiment, the first switch may be configured as a two-way switch, such that the values of the first or second operand are selected by a multiplexor to connect or disconnect individual capacitor switches, and the charge/discharge status connection may also be selected by the current phase/clock cycle. As such, the switches may be controlled by an appropriate control circuit to execute the discussed functions.
The following table illustrates the positions for the first switches 520 and the second switch 540 in one embodiment of the MAC configuration shown in
As shown in Table 2, similar to Table 1, the switches may initially be connected to a ground voltage to discharge existing charge and selectively charged according to the first operand. The second switch 540 may then be connected to Vlocal such that the averaging is performed across the local capacitors, which are then selectively discharged or maintained to “multiply” the averaged charge based on the second operand. Each analog multiplication circuit 400 may then have a charge corresponding to the multiplication of its respective operands, which are then averaged across the analog multiplication circuits 400 by switching S2 to VMAC. Similar to the discussion of analog multiplication circuit 100, a control circuit may control the execution of the steps shown above based on multiplexors, control signals, two-way switches, and/or other components.
Finally,
Example devices
A number of components are illustrated in
Additionally, in various embodiments, the computing device 700 may not include one or more of the components illustrated in
The computing device 700 may include a processing device 702 (e.g., one or more processing devices). As used herein, the term “processing device” or “processor” may refer to any device or portion of a device that processes electronic data from registers and/or memory to transform that electronic data into other electronic data that may be stored in registers and/or memory. The processing device 702 may include one or more digital signal processors (DSPs), application-specific ICs (ASICs), central processing units (CPUs), graphics processing units (GPUs), cryptoprocessors (specialized processors that execute cryptographic algorithms within hardware), server processors, or any other suitable processing devices. The computing device 700 may include a memory 704, which may itself include one or more memory devices such as volatile memory (e.g., dynamic random-access memory (DRAM)), nonvolatile memory (e.g., read-only memory (ROM)), flash memory, solid state memory, and/or a hard drive. The memory 704 may include instructions executable by the processing device for performing methods and functions as discussed herein. Such instructions may be instantiated in various types of memory, which may include non-volatile memory and as stored on one or more non-transitory mediums. In some embodiments, the memory 704 may include memory that shares a die with the processing device 702. This memory may be used as cache memory and may include embedded dynamic random-access memory (eDRAM) or spin transfer torque magnetic random-access memory (STT-MRAM).
In some embodiments, the computing device 700 may include a communication chip 712 (e.g., one or more communication chips). For example, the communication chip 712 may be configured for managing wireless communications for the transfer of data to and from the computing device 700. The term “wireless” and its derivatives may be used to describe circuits, devices, systems, methods, techniques, communications channels, etc., that may communicate data through the use of modulated electromagnetic radiation through a nonsolid medium. The term does not imply that the associated devices do not contain any wires, although in some embodiments they might not.
The communication chip 712 may implement any of a number of wireless standards or protocols, including but not limited to Institute for Electrical and Electronic Engineers (IEEE) standards including Wi-Fi (IEEE 802.11 family), IEEE 802.16 standards (e.g., IEEE 802.16-2005 Amendment), Long-Term Evolution (LTE) project along with any amendments, updates, and/or revisions (e.g., advanced LTE project, ultramobile broadband (UMB) project (also referred to as “3GPP2”), etc.). IEEE 802.16 compatible Broadband Wireless Access (BWA) networks are generally referred to as WiMAX networks, an acronym that stands for Worldwide Interoperability for Microwave Access, which is a certification mark for products that pass conformity and interoperability tests for the IEEE 802.16 standards. The communication chip 712 may operate in accordance with a Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), Universal Mobile Telecommunications System (UMTS), High-Speed Packet Access (HSPA), Evolved HSPA (E-HSPA), or LTE network. The communication chip 712 may operate in accordance with Enhanced Data for GSM Evolution (EDGE), GSM EDGE Radio Access Network (GERAN), Universal Terrestrial Radio Access Network (UTRAN), or Evolved UTRAN (E-UTRAN). The communication chip 712 may operate in accordance with Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Digital Enhanced Cordless Telecommunications (DECT), Evolution-Data Optimized (EV-DO), and derivatives thereof, as well as any other wireless protocols that are designated as 3G, 4G, 5G, and beyond. The communication chip 712 may operate in accordance with other wireless protocols in other embodiments. The computing device 700 may include an antenna 722 to facilitate wireless communications and/or to receive other wireless communications (such as AM or FM radio transmissions).
In some embodiments, the communication chip 712 may manage wired communications, such as electrical, optical, or any other suitable communication protocols (e.g., the Ethernet). As noted above, the communication chip 712 may include multiple communication chips. For instance, a first communication chip 712 may be dedicated to shorter-range wireless communications such as Wi-Fi or Bluetooth, and a second communication chip 712 may be dedicated to longer-range wireless communications such as global positioning system (GPS), EDGE, GPRS, CDMA, WiMAX, LTE, EV-DO, or others. In some embodiments, a first communication chip 712 may be dedicated to wireless communications, and a second communication chip 712 may be dedicated to wired communications.
The computing device 700 may include battery/power circuitry 714. The battery/power circuitry 714 may include one or more energy storage devices (e.g., batteries or capacitors) and/or circuitry for coupling components of the computing device 700 to an energy source separate from the computing device 700 (e.g., AC line power).
The computing device 700 may include a display device 706 (or corresponding interface circuitry, as discussed above). The display device 706 may include any visual indicators, such as a heads-up display, a computer monitor, a projector, a touchscreen display, a liquid crystal display (LCD), a light-emitting diode display, or a flat panel display, for example.
The computing device 700 may include an audio output device 708 (or corresponding interface circuitry, as discussed above). The audio output device 708 may include any device that generates an audible indicator, such as speakers, headsets, or earbuds, for example.
The computing device 700 may include an audio input device 718 (or corresponding interface circuitry, as discussed above). The audio input device 718 may include any device that generates a signal representative of a sound, such as microphones, microphone arrays, or digital instruments (e.g., instruments having a musical instrument digital interface (MIDI) output).
The computing device 700 may include a GPS device 716 (or corresponding interface circuitry, as discussed above). The GPS device 716 may be in communication with a satellite-based system and may receive a location of the computing device 700, as known in the art.
The computing device 700 may include an other output device 710 (or corresponding interface circuitry, as discussed above). Examples of the other output device 710 may include an audio codec, a video codec, a printer, a wired or wireless transmitter for providing information to other devices, or an additional storage device.
The computing device 700 may include an other input device 720 (or corresponding interface circuitry, as discussed above). Examples of the other input device 720 may include an accelerometer, a gyroscope, a compass, an image capture device, a keyboard, a cursor control device such as a mouse, a stylus, a touchpad, a bar code reader, a Quick Response (QR) code reader, any sensor, or a radio frequency identification (RFID) reader.
The computing device 700 may have any desired form factor, such as a hand-held or mobile computing device (e.g., a cell phone, a smart phone, a mobile internet device, a music player, a tablet computer, a laptop computer, a netbook computer, an ultrabook computer, a personal digital assistant (PDA), an ultramobile personal computer, etc.), a desktop computing device, a server or other networked computing component, a printer, a scanner, a monitor, a set-top box, an entertainment control unit, a vehicle control unit, a digital camera, a digital video recorder, or a wearable computing device. In some embodiments, the computing device 700 may be any other electronic device that processes data.
The following paragraphs provide various examples of the embodiments disclosed herein.
Example 1 provides a circuit including a plurality of capacitors, each capacitor having a capacitance corresponding to a bit value of a logical bit in a plurality of logical bits; a plurality of first switches, each first switch coupled to a corresponding capacitor and configured to connect the corresponding capacitor to a common interconnect or to a ground voltage based at least in part on a first operand or a second operand; a second switch configured to selectively connect the common interconnect to a voltage source; and an analog-to-digital converter configured to read a voltage of the common interconnect and generate a digital output.
Example 2 provides the circuit of example 1, further including a control circuit configured to: selectively charge the plurality of capacitors by switching the plurality of first switches to the common interconnect according to the first operand and switching the second switch to the voltage source; after charging the plurality of capacitors, switching the second switch away from the voltage source and connecting the plurality of first switches to the common interconnect; selectively reducing the charge of the plurality of capacitors by switching the plurality of first switches to the common interconnect or the ground voltage according to the second operand; and after selectively reducing the charge, switching the plurality of first switches to the common interconnect.
Example 3 provides the circuit of example 2, wherein the control circuit is further configured to, before selectively charging the plurality of capacitors, discharge the capacitors by switching the plurality of first switches to the ground voltage.
Example 4 provides the circuit of any of examples 1-3, wherein the second switch is further configured to selectively connect the common interconnect to the ground voltage.
Example 5 provides the circuit of any of examples 1-4, wherein the first or second logical operand comprise binary logical bits.
Example 6 provides the circuit of any of examples 1-5, wherein the circuit is in a memory array.
Example 7 provides the circuit of example 6, wherein the digital output is stored to a location in the memory array.
Example 8 provides the circuit of example 6, wherein the first operand or the second operand is stored in the memory array.
Example 9 provides the circuit of any of examples 1-8, wherein the first operand or the second operand is a weight value.
Example 10 provides the circuit of any of examples 1-9, wherein the first operand or the second operand is an activation value.
Example 11 provides the circuit of any of examples 1-10, wherein the common interconnect is further connected to a second plurality of capacitors, and the voltage level is an accumulation of a first multiplication represented by a first charge of the plurality of capacitors and a second multiplication represented by a second charge of the second plurality of capacitors.
Example 12 provides a method comprising selectively charging a plurality of capacitors, each capacitor having a capacitance corresponding to a bit value of a logical bit in a plurality of logical bits, by switching a plurality of first switches, each first switch coupled with to a respective capacitor of the plurality of capacitors, to a common interconnect or a ground voltage according to a first operand and switching a second switch coupled to the common interconnect to a voltage source; after charging the plurality of capacitors, switching the second switch away from the voltage source and connecting the plurality of first switches to the common interconnect to average the charge across the plurality of capacitors; selectively reducing the charge of the plurality of capacitors by switching the plurality of first switches to the common interconnect or the ground voltage according to a second operand; after selectively reducing the charge, switching the plurality of first switches to the common interconnect; and outputting a digital output based on a voltage level of the common interconnect.
Example 13 provides the method for example 12, further comprising, before selectively charging the plurality of capacitors, discharging the capacitors by switching the plurality of first switches to the ground voltage.
Example 14 provides the method of any of examples 12-13, wherein the second switch is further configured to selectively connect the common interconnect to the ground voltage.
Example 15 provides the method of any of examples 12-14, wherein the first or second logical operand comprise binary logical bits.
Example 16 provides the method of any of examples 12-15, wherein the plurality of capacitors is in a memory array.
Example 17 provides the method of example 16, wherein the digital multiplication output is stored to a location in the memory array.
Example 18 provides the method of example 16, wherein the first operand or the second operand is stored in the memory array.
Example 19 provides the method of any of examples 12-18, wherein the first operand or the second operand is a weight value.
Example 20 provides the method of any of examples 12-19, wherein the first operand or the second operand is an activation value.
Example 21 provides the method of any of examples 12-20, wherein the common interconnect is further connected to a second plurality of capacitors, and the voltage level is an accumulation of a first multiplication represented by a first charge of the plurality of capacitors and a second multiplication represented by a second charge of the second plurality of capacitors.
The above description of illustrated implementations of the disclosure, including what is described in the Abstract, is not intended to be exhaustive or to limit the disclosure to the precise forms disclosed. While specific implementations of, and examples for, the disclosure are described herein for illustrative purposes, various equivalent modifications are possible within the scope of the disclosure, as those skilled in the relevant art will recognize. These modifications may be made to the disclosure in light of the above detailed description.