The disclosure relates generally to accelerators, and more particularly to managing the load of an artificial intelligence (AI) accelerator.
The use of accelerators, particularly in processing large data models for Artificial Intelligence (AI) or Machine Learning (ML), may involve the use of accelerators. Processing operations may involve processing large amounts of data, or performing multiple calculations on a particular data. Because of the varying workloads, efficient operation of accelerators may be difficult to achieve.
A need remains to manage accelerator loads efficiently.
The drawings described below are examples of how embodiments of the disclosure may be implemented, and are not intended to limit embodiments of the disclosure. Individual embodiments of the disclosure may include elements not shown in particular figures and/or may omit elements shown in particular figures. The drawings are intended to provide illustration and may not be to scale.
An accelerator may receive a frequency control signal from software. The accelerator may use the frequency control signal to control the frequency of the accelerator.
Reference will now be made in detail to embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. In the following detailed description, numerous specific details are set forth to enable a thorough understanding of the disclosure. It should be understood, however, that persons having ordinary skill in the art may practice the disclosure without these specific details. In other instances, well-known methods, procedures, components, circuits, and networks have not been described in detail so as not to unnecessarily obscure aspects of the embodiments.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first module could be termed a second module, and, similarly, a second module could be termed a first module, without departing from the scope of the disclosure.
The terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in the description of the disclosure and the appended claims, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the term “and/or” as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The components and features of the drawings are not necessarily drawn to scale.
Processing large amounts of data, for example, for Artificial Intelligence (AI) or Machine Learning (ML) models, may involve hardware that uses various accelerators. The workloads on these accelerators may vary: some workloads might be processing-intensive, while other workloads might be data-intensive.
Operating the accelerators at their maximum frequency might result in the fastest results from the accelerators. But operating the accelerators at their maximum frequency might also mean that the accelerators consume the most power, which may be inefficient from a power-management point of view. In addition, the accelerators might not need to operate all the time at maximum power. It might be desirable for two or more accelerators to end their processing at the same time. When operating at maximum frequency, one accelerator might finish before the other, leaving that accelerator to idle until the other accelerator finishes its processing. If the first accelerator were to operate at a lower frequency, it might consume less power and finish its operations in a way that roughly coincides with the other accelerator, avoiding the accelerator idling.
Accelerators may include a Proportional Integrator Derivative (PID) control loop, which may adjust the operating frequency of the accelerator. Depending on how busy the accelerator currently is and what is considered a target busy level, the PID control loop may suggest adjusting the frequency of the accelerator up or down smoothly to reach the target busy level. But the PID control loop does not know about other accelerators and when they might finish their processing of data: the PID control loop works for the accelerator in isolation. In addition, the PID control loop has no information about what the workload of the accelerator may be for the next iteration.
Embodiments of the disclosure address these issues by introducing a frequency control signal from an AI workload runtime software running under the operating system. The AI workload runtime software, or AI software for short, may handle the division and scheduling of work, and thus may have information regarding the current and future workloads for both the accelerator and other accelerators. The AI software may issue frequency control signals to the accelerators to manage their power consumption and to coordinate when the accelerators finish their processing. Embodiments of the disclosure may issue frequency control signals based on the type of processing the accelerator is to perform, the number of vectors the accelerator is to process, or the dimensionality of the vectors (the number of coordinates in the vectors) that the accelerator is to process. Embodiments of the disclosure may use the frequency control signals to override an internal frequency control signal from a PID, or to override a minimum or maximum frequency established for the accelerator.
Processor 110 may be any variety of processor. (Processor 110, along with the other components discussed below, are shown outside the machine for ease of illustration: embodiments of the disclosure may include these components within the machine.) While
Processor 110 may be coupled to memory 115. Memory 115, which may also be referred to as a main memory, may be any variety of memory, such as flash memory, Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Persistent Random Access Memory, Ferroelectric Random Access Memory (FRAM), or Non-Volatile Random Access Memory (NVRAM), such as Magnetoresistive Random Access Memory (MRAM) etc. Memory 115 may also be any desired combination of different memory types, and may be managed by memory controller 125. Memory 115 may be used to store data that may be termed “short-term”: that is, data not expected to be stored for extended periods of time. Examples of short-term data may include temporary files, data being used locally by applications (which may have been copied from other storage locations), and the like.
Processor 110 and memory 115 may also support an operating system under which various applications may be running. These applications may issue requests (which may also be termed commands) to read data from or write data to either memory 115 or storage device 120. Storage device 120 may be accessed using device driver 130.
Storage device 120 may be associated with an accelerator (which may be accelerator 135 or may be a different accelerator), which may also be referred to as a computational storage device, computational storage unit, computational storage device, or computational device. Storage device 120 and the accelerator may be designed and manufactured as a single integrated unit, or the accelerator may be separate from storage device 120. The phrase “associated with” is intended to cover both a single integrated unit including both a storage device and an accelerator and a storage device that is paired with an accelerator but that are not manufactured as a single integrated unit. In other words, a storage device and an accelerator may be said to be “paired” when they are physically separate devices but are connected in a manner that enables them to communicate with each other. Further, in the remainder of this document, any reference to storage device 120 may be understood to refer to the devices either as physically separate but paired (and therefore may include the other device) or to both devices integrated into a single component as a computational storage unit.
In addition, the connection between the storage device and the paired accelerator might enable the two devices to communicate, but might not enable one (or both) devices to work with a different partner: that is, the storage device might not be able to communicate with another accelerator, and/or the accelerator might not be able to communicate with another storage device. For example, the storage device and the paired accelerator might be connected serially (in either order) to the fabric, enabling the accelerator to access information from the storage device in a manner another accelerator might not be able to achieve.
While
Processor 105 and storage device 120 may communicate across a fabric (not shown in
While
There are a number of reasons the frequency (that is, the clock cycle) of processing circuit 325 matters. First, there is a correlation between frequency and power consumption. The higher the frequency of processing circuit 325, the greater the power consumed. Since minimizing the amount of power used by accelerator 135 may be an objective, it may be considered more efficient for accelerator 135 to operate at a lower frequency, even if it means the computations may take a little longer. Second, if there is more than one accelerator 135 in machine 105 of
In addition, the actual frequency to be used by accelerator 135 may depend on other factors. For example, the dimensionality of the data (that is, the number of coordinates in the data vectors) and the number of vectors in the data may factor into the calculation of the appropriate frequency control signal. Similarly, the architecture of accelerator 135 may be relevant in calculating the frequency control signal for accelerator 135. As the specifics of how the frequency control signal may depend on numerous factors, how a particular example frequency may be calculated for a particular accelerator 135 is not described herein.
Control circuit 320 may then compare internal frequency control signal 405 with frequency minimum 410 and frequency maximum 415. Frequency minimum 410 and frequency maximum 415 may represent the lower and upper bounds, respectively, for the frequency of accelerator 135 of
While the operation of accelerator 135 of
Thus, as shown in
While
In some embodiments of the disclosure, control circuit 320 may apply the frequency specified in frequency control signal 425-1 for the duration of executing the data processing command by processing circuit 325. In other embodiments of the disclosure, control circuit 320 may apply the frequency specified in frequency control signal 425-1 until either a new frequency is specified in a new frequency control signal 425-1, or until software 420 de-asserts frequency control signal 425-1. Once control circuit 320 does not apply frequency control signal 425-1 anymore, internal frequency control signal 405 from PID control loop 315 may be applied instead, and the bounds of control circuit 320 may apply the frequency specified in frequency control signal 425-1, and the bounds of frequency minimum 410 and/or frequency maximum 415 may also be applied.
In
Embodiments of the disclosure may include software to generate a frequency control signal for an accelerator. The accelerator may include a control circuit which may receive the frequency control signal, and which may override an internal frequency control signal generated internally: for example, by a Proportional Integrator Derivative (PID) control loop. As the software may have access to information about the workload that the PID control loop may not have access to, the frequency control signal from the software may result in a more efficient operation of the accelerator, providing a technical advantage.
Artificial Intelligence (AI) Accelerators require precise frequency and performance management architecture. Time to completion for a units of work is hard to predict. Embodiments of the disclosure provide a hardware/software based solution to load management for AI accelerators.
Embodiments of the disclosure may enable independent demand based frequency control for accelerators. Software-driven boost signals may provide for tight hardware/software coordination. Embodiments of the disclosure may be scalable for multiple accelerators. Each accelerator may use the following parameters for independent frequency control:
Embodiments of the disclosure may support independent demand based frequency control, which may enable hardware based workload balancing, power consumption control via frequency management, programmable frequency minimum, maximum, and responsiveness to allow for finer grained heterogeneous solutions. Embodiments of the disclosure may support for fast responsiveness but power hungry accelerators, medium responsiveness accelerators, for balanced performance, and economic accelerators for maximal power efficiency. Embodiments of the disclosure may help to solve complex producer/consumer relationships within multi-accelerator topology.
The following discussion is intended to provide a brief, general description of a suitable machine or machines in which certain aspects of the disclosure may be implemented. The machine or machines may be controlled, at least in part, by input from conventional input devices, such as keyboards, mice, etc., as well as by directives received from another machine, interaction with a virtual reality (VR) environment, biometric feedback, or other input signal. As used herein, the term “machine” is intended to broadly encompass a single machine, a virtual machine, or a system of communicatively coupled machines, virtual machines, or devices operating together. Exemplary machines include computing devices such as personal computers, workstations, servers, portable computers, handheld devices, telephones, tablets, etc., as well as transportation devices, such as private or public transportation, e.g., automobiles, trains, cabs, etc.
The machine or machines may include embedded controllers, such as programmable or non-programmable logic devices or arrays, Application Specific Integrated Circuits (ASICs), embedded computers, smart cards, and the like. The machine or machines may utilize one or more connections to one or more remote machines, such as through a network interface, modem, or other communicative coupling. Machines may be interconnected by way of a physical and/or logical network, such as an intranet, the Internet, local area networks, wide area networks, etc. One skilled in the art will appreciate that network communication may utilize various wired and/or wireless short range or long range carriers and protocols, including radio frequency (RF), satellite, microwave, Institute of Electrical and Electronics Engineers (IEEE) 802.11, Bluetooth®, optical, infrared, cable, laser, etc.
Embodiments of the present disclosure may be described by reference to or in conjunction with associated data including functions, procedures, data structures, application programs, etc. which when accessed by a machine results in the machine performing tasks or defining abstract data types or low-level hardware contexts. Associated data may be stored in, for example, the volatile and/or non-volatile memory, e.g., RAM, ROM, etc., or in other storage devices and their associated storage media, including hard-drives, floppy-disks, optical storage, tapes, flash memory, memory sticks, digital video disks, biological storage, etc. Associated data may be delivered over transmission environments, including the physical and/or logical network, in the form of packets, serial data, parallel data, propagated signals, etc., and may be used in a compressed or encrypted format. Associated data may be used in a distributed environment, and stored locally and/or remotely for machine access.
Embodiments of the disclosure may include a tangible, non-transitory machine-readable medium comprising instructions executable by one or more processors, the instructions comprising instructions to perform the elements of the disclosures as described herein.
The various operations of methods described above may be performed by any suitable means capable of performing the operations, such as various hardware and/or software component(s), circuits, and/or module(s). The software may comprise an ordered listing of executable instructions for implementing logical functions, and may be embodied in any “processor-readable medium” for use by or in connection with an instruction execution system, apparatus, or device, such as a single or multiple-core processor or processor-containing system.
The blocks or steps of a method or algorithm and functions described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a tangible, non-transitory computer-readable medium. A software module may reside in Random Access Memory (RAM), flash memory, Read Only Memory (ROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, hard disk, a removable disk, a CD ROM, or any other form of storage medium known in the art.
Having described and illustrated the principles of the disclosure with reference to illustrated embodiments, it will be recognized that the illustrated embodiments may be modified in arrangement and detail without departing from such principles, and may be combined in any desired manner. And, although the foregoing discussion has focused on particular embodiments, other configurations are contemplated. In particular, even though expressions such as “according to an embodiment of the disclosure” or the like are used herein, these phrases are meant to generally reference embodiment possibilities, and are not intended to limit the disclosure to particular embodiment configurations. As used herein, these terms may reference the same or different embodiments that are combinable into other embodiments.
The foregoing illustrative embodiments are not to be construed as limiting the disclosure thereof. Although a few embodiments have been described, those skilled in the art will readily appreciate that many modifications are possible to those embodiments without materially departing from the novel teachings and advantages of the present disclosure. Accordingly, all such modifications are intended to be included within the scope of this disclosure as defined in the claims.
Embodiments of the disclosure may extend to the following statements, without limitation:
Statement 1. An embodiment of the disclosure includes an accelerator, comprising:
Statement 2. An embodiment of the disclosure includes the accelerator according to statement 1, wherein the interface is configured to receive the frequency control signal from a software executing on the processor.
Statement 3. An embodiment of the disclosure includes the accelerator according to statement 2, wherein the interface is configured to receive the frequency control signal from an Artificial Intelligence (AI) workload software executing on the processor.
Statement 4. An embodiment of the disclosure includes the accelerator according to statement 1, further comprising a Proportional Integrator Derivative (PID) controller to generate an internal frequency control signal for the accelerator sent to the control circuit.
Statement 5. An embodiment of the disclosure includes the accelerator according to statement 4, wherein the control circuit applies the frequency control signal.
Statement 6. An embodiment of the disclosure includes the accelerator according to statement 5, wherein the control circuit applies the frequency control signal to override the internal frequency control signal from the PID controller.
Statement 7. An embodiment of the disclosure includes the accelerator according to statement 1, wherein the control circuit includes a maximum frequency for the circuit.
Statement 8. An embodiment of the disclosure includes the accelerator according to statement 7, wherein the accelerator includes a firmware, the firmware including the maximum frequency.
Statement 9. An embodiment of the disclosure includes the accelerator according to statement 7, wherein the frequency control signal is greater than the maximum frequency for the circuit.
Statement 10. An embodiment of the disclosure includes the accelerator according to statement 1, wherein the control circuit includes a minimum frequency for the circuit.
Statement 11. An embodiment of the disclosure includes the accelerator according to statement 10, wherein the accelerator includes a firmware, the firmware including the minimum frequency.
Statement 12. An embodiment of the disclosure includes the accelerator according to statement 10, wherein the frequency control signal is less than the minimum frequency for the circuit.
Statement 13. An embodiment of the disclosure includes the accelerator according to statement 1, wherein the frequency control signal is based at least in part on one of a first workload of the accelerator or a first implementation of the accelerator.
Statement 14. An embodiment of the disclosure includes the accelerator according to statement 13, wherein the frequency control signal is further based at least in part on one of a second workload of a second accelerator or a second implementation of the second accelerator.
Statement 15. An embodiment of the disclosure includes the accelerator according to statement 14, wherein:
Statement 16. An embodiment of the disclosure includes the accelerator according to statement 14, wherein:
Statement 17. An embodiment of the disclosure includes the accelerator according to statement 14, wherein the frequency control signal is calculated to coordinate the accelerator and the second accelerator.
Statement 18. An embodiment of the disclosure includes the accelerator according to statement 14, wherein the frequency control signal coordinates the accelerator and the second accelerator.
Statement 19. An embodiment of the disclosure includes the accelerator according to statement 1, wherein the control circuit applies the frequency control signal to the data processing command.
Statement 20. An embodiment of the disclosure includes the accelerator according to statement 1, wherein the control circuit applies the frequency control signal overrides a second frequency control signal.
Statement 21. An embodiment of the disclosure includes the accelerator according to statement 1, wherein the frequency control signal de-asserts a second frequency control signal.
Statement 22. An embodiment of the disclosure includes a method, comprising:
Statement 23. An embodiment of the disclosure includes the method according to statement 22, wherein receiving the frequency control signal from the processor at the control circuit of the accelerator includes receiving the frequency control signal from a software executing on the processor at the control circuit of the accelerator.
Statement 24. An embodiment of the disclosure includes the method according to statement 23, wherein receiving the frequency control signal from the software executing on the processor at the control circuit of the accelerator includes receiving the frequency control signal from an Artificial Intelligence (AI) workload software executing on the processor at the control circuit of the accelerator.
Statement 25. An embodiment of the disclosure includes the method according to statement 22, wherein:
Statement 26. An embodiment of the disclosure includes the method according to statement 22, wherein the control circuit includes a maximum frequency.
Statement 27. An embodiment of the disclosure includes the method according to statement 26, wherein the frequency control signal is greater than the maximum frequency.
Statement 28. An embodiment of the disclosure includes the method according to statement 26, further comprising reading the maximum frequency from a firmware of the accelerator.
Statement 29. An embodiment of the disclosure includes the method according to statement 22, wherein the control circuit includes a minimum frequency.
Statement 30. An embodiment of the disclosure includes the method according to statement 29, wherein the frequency control signal is less than the minimum frequency.
Statement 31. An embodiment of the disclosure includes the method according to statement 29, further comprising reading the minimum frequency from a firmware of the accelerator.
Statement 32. An embodiment of the disclosure includes the method according to statement 22, wherein the frequency control signal is based at least in part on one of a first workload of the accelerator or a first implementation of the accelerator.
Statement 33. An embodiment of the disclosure includes the method according to statement 32, wherein the frequency control signal is further based at least in part on one of a second workload of a second accelerator or a second implementation of the second accelerator.
Statement 34. An embodiment of the disclosure includes the method according to statement 33, wherein:
Statement 35. An embodiment of the disclosure includes the method according to statement 33, wherein:
Statement 36. An embodiment of the disclosure includes the method according to statement 33, wherein the frequency control signal is calculated to coordinate the accelerator and the second accelerator.
Statement 37. An embodiment of the disclosure includes the method according to statement 33, wherein the frequency control signal coordinates the accelerator and the second accelerator.
Statement 38. An embodiment of the disclosure includes the method according to statement 22, wherein applying the frequency control signal to the circuit of the accelerator by the control circuit includes applying the frequency control signal to the circuit of the accelerator by the control circuit for the data processing command.
Statement 39. An embodiment of the disclosure includes the method according to statement 22, wherein applying the frequency control signal to the circuit of the accelerator by the control circuit includes overriding a second frequency control signal by the control circuit based at least in part on the frequency control signal.
Statement 40. An embodiment of the disclosure includes the method according to statement 22, wherein applying the frequency control signal to the circuit of the accelerator by the control circuit includes de-asserting a second frequency control signal based at least in part on the frequency control signal.
Statement 41. An embodiment of the disclosure includes a system, comprising a non-transitory storage medium, the non-transitory storage medium having stored thereon instructions that, when executed by a machine, result in:
Statement 42. An embodiment of the disclosure includes the system according to statement 41, wherein receiving the frequency control signal from the processor at the control circuit of the accelerator includes receiving the frequency control signal from a software executing on the processor at the control circuit of the accelerator.
Statement 43. An embodiment of the disclosure includes the system according to statement 42, wherein receiving the frequency control signal from the software executing on the processor at the control circuit of the accelerator includes receiving the frequency control signal from an Artificial Intelligence (AI) workload software executing on the processor at the control circuit of the accelerator.
Statement 44. An embodiment of the disclosure includes the system according to statement 41, wherein:
Statement 45. An embodiment of the disclosure includes the system according to statement 41, wherein the control circuit includes a maximum frequency.
Statement 46. An embodiment of the disclosure includes the system according to statement 45, wherein the frequency control signal is greater than the maximum frequency.
Statement 47. An embodiment of the disclosure includes the system according to statement 45, the non-transitory storage medium having stored thereon further instructions that, when executed by the machine, result in reading the maximum frequency from a firmware of the accelerator.
Statement 48. An embodiment of the disclosure includes the system according to statement 41, wherein the control circuit includes a minimum frequency.
Statement 49. An embodiment of the disclosure includes the system according to statement 48, wherein the frequency control signal is less than the minimum frequency.
Statement 50. An embodiment of the disclosure includes the system according to statement 48, the non-transitory storage medium having stored thereon further instructions that, when executed by the machine, result in reading the minimum frequency from a firmware of the accelerator.
Statement 51. An embodiment of the disclosure includes the system according to statement 41, wherein the frequency control signal is based at least in part on one of a first workload of the accelerator or a first implementation of the accelerator.
Statement 52. An embodiment of the disclosure includes the system according to statement 51, wherein the frequency control signal is further based at least in part on one of a second workload of a second accelerator or a second implementation of the second accelerator.
Statement 53. An embodiment of the disclosure includes the system according to statement 52, wherein:
Statement 54. An embodiment of the disclosure includes the system according to statement 52, wherein:
Statement 55. An embodiment of the disclosure includes the system according to statement 52, wherein the frequency control signal is calculated to coordinate the accelerator and the second accelerator.
Statement 56. An embodiment of the disclosure includes the system according to statement 52, wherein the frequency control signal coordinates the accelerator and the second accelerator.
Statement 57. An embodiment of the disclosure includes the system according to statement 41, wherein applying the frequency control signal to the circuit of the accelerator by the control circuit includes applying the frequency control signal to the circuit of the accelerator by the control circuit for the data processing command.
Statement 58. An embodiment of the disclosure includes the system according to statement 41, wherein applying the frequency control signal to the circuit of the accelerator by the control circuit includes overriding a second frequency control signal by the control circuit based at least in part on the frequency control signal.
Statement 59. An embodiment of the disclosure includes the system according to statement 41, wherein applying the frequency control signal to the circuit of the accelerator by the control circuit includes de-asserting a second frequency control signal based at least in part on the frequency control signal.
Consequently, in view of the wide variety of permutations to the embodiments described herein, this detailed description and accompanying material is intended to be illustrative only, and should not be taken as limiting the scope of the disclosure. What is claimed as the disclosure, therefore, is all such modifications as may come within the scope and spirit of the following claims and equivalents thereto.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 63/606,091, filed Dec. 4, 2023, which is incorporated by reference herein for all purposes.
Number | Date | Country | |
---|---|---|---|
63606091 | Dec 2023 | US |