This application claims priority to Chinese Application number CN2022103493005, filed on Apr. 1, 2022, the contents of which are incorporated herein by reference.
This application relates to the field of artificial intelligence computing technology, in particular to an AI computing platform, an AI computing method, and an AI cloud computing system.
In the prior art, when a single server including a plurality of AI computing modules handles complex tasks, a conventional approach is as shown in
Embodiments of the present application provide an AI computing platform, an AI computing method, and AI cloud computing system, which give full play to the advantages of segmented computing, reduce unified scheduling load of the processor, have good versatility and scalability, and can be applied to single-server, multi-server and cloud scenarios.
This application discloses an AI computing platform, the platform includes: at least one computing component, each computing component includes:
In one embodiment, the network topology information table comprises: an IP address of a host where the processor is located, a near-memory computing module index, operation type(s) supported by the near-memory computing module, a load rate, number of adjacent near-memory computing modules, and operation types supported by the adjacent near-memory computing modules.
In one embodiment, the processor is further configured to generate a package according to the decomposed plurality of ordered subtasks, and transmit the package to a near-memory computing module that processes the first subtask, wherein the package comprises: number of near-memory computing modules required to process the package, a list of near-memory computing modules required to process the package, a header cyclic redundancy check, a payload length, and a payload.
In one embodiment, the near-memory computing module is further configured to process a corresponding subtask when receiving the package and generate a processed package, and transmit the processed package to a next near-memory computing module according to the list of near-memory computing modules or return the processed package to the processor.
In one embodiment, the list of near-memory computing module comprises an IP address of a host located, an index, operation type(s) supported, and a load rate of the near-memory computing module sequentially processing each subtask.
In one embodiment, the processor is further configured to implement operation type(s) different from the plurality of near-memory computing modules, wherein the processor and the plurality of near-memory computing modules complete one or more of the plurality of subtasks according to the operation types they each implement.
In one embodiment, the processors of each computing component may be connected via a bus.
This application also discloses an AI computing method, the method includes:
In one embodiment, the plurality of near-memory computing modules connect in pairs with the processor, and the plurality of near-memory computing modules connect in pairs with each other, wherein the plurality of near-memory computing modules are each configured to implement different operation types.
In one embodiment, the network topology information table comprises: an IP address of a host where the processor is located, a near-memory computing module index, operation type(s) supported by the near-memory computing module, a load rate, number of adjacent near-memory computing modules, and operation types supported by the adjacent near-memory computing modules.
In one embodiment, the package comprises: number of near-memory computing modules required to process the package, a list of near-memory computing modules required to process the package, a header cyclic redundancy check, a payload length, and a payload.
In one embodiment, the list of near-memory computing module comprises an IP address of a host located, an index, operation type(s) supported, and a load rate of the near-memory computing module sequentially processing each subtask.
In one embodiment, the processor is further configured to implement operation type(s) different from the plurality of near-memory computing modules, wherein the processor and the plurality of near-memory computing modules complete one or more of the plurality of subtasks according to the operation types they each implement.
The present application also discloses an AI cloud computing system, the system includes: a cloud computing center, and a plurality of the AI computing platform, where the cloud computing center is connected to the plurality of AI computing platforms;
In one embodiment, the network topology information table comprises: an IP address of a host where the processor is located, a near-memory computing module index, operation type(s) supported by the near-memory computing module, a load rate, number of adjacent near-memory computing modules, and operation types supported by the adjacent near-memory computing modules.
In one embodiment, the processor is further configured to generate a package according to the decomposed plurality of ordered subtasks, and transmit the package to a near-memory computing module that processes the first subtask, wherein the package comprises: number of near-memory computing modules required to process the package, a list of near-memory computing modules required to process the package, a header cyclic redundancy check, a payload length, and a payload.
In one embodiment, the near-memory computing module is further configured to process a corresponding subtask when receiving the package and generate a processed package, and transmit the processed package to a next near-memory computing module according to the list of near-memory computing modules or return the processed package to the processor.
In one embodiment, the list of near-memory computing module comprises an IP address of a host located, an index, operation type(s) supported, and a load rate of the near-memory computing module sequentially processing each subtask.
In one embodiment, the processor is further configured to implement operation type(s) different from the plurality of near-memory computing modules, wherein the processor and the plurality of near-memory computing modules complete one or more of the plurality of subtasks according to the operation types they each implement.
In one embodiment, the processors of each computing component may be connected via a bus.
Embodiments of the present application will be described in even greater detail below based on the exemplary figures. The present application is not limited to the exemplary embodiments. All features described and/or illustrated herein can be used alone or combined in different combinations in embodiments of the present application. The features and advantages of various embodiments of the present application will become apparent by reading the following detailed description with reference to the attached drawings which illustrate the following:
In the implementation of an embodiment of the present application, the processor decomposes a calculation task into a plurality of ordered subtasks based on a network topology information table, and generates a package based on the near-memory segmented computing protocol (SC4NCM). When receiving the package, each near-memory computing module processes the corresponding subtask according to the operation type that it can perform. This application reduces unified scheduling load of the processor while giving full play to the advantages of segmented computing, and has good generality and scalability, so it can be applied to single-server, multi-server and cloud scenarios at the same time, and provides an efficient cloud-native computing platform architecture for AI servers to deploy cloud-native applications. In addition, because SC4NCM protocol package can flexibly define various computing functions, they can also be applied to computing modules such as GPGPU and FPGA, and can be extended to more complex computing applications such as machine training.
A large number of technical features are described in the specification of the present application, and are distributed in various technical solutions. If a combination (i.e., a technical solution) of all possible technical features of the present application is listed, the description may be made too long. In order to avoid this problem, the various technical features disclosed in the above summary of the present application, the technical features disclosed in the various embodiments and examples below, and the various technical features disclosed in the drawings can be freely combined with each other to constitute various new technical solutions (all of which are considered to have been described in this specification), unless a combination of such technical features is not technically feasible. For example, feature A+B+C is disclosed in one example, and feature A+B+D+E is disclosed in another example, while features C and D are equivalent technical means that perform the same function, and technically only choose one, not to adopt at the same time. Feature E can be combined with feature C technically. Then, the A+B+C+D scheme should not be regarded as already recorded because of the technical infeasibility, and A+B+C+E scheme should be considered as already documented.
In the following description, numerous technical details are set forth in order to provide the readers with a better understanding of the present application. However, those skilled in the art can understand that the technical solutions claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
In order to make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
An embodiment of the present application provides an AI computing platform. The AI computing platform includes at least one computing component. Each computing component includes a processor and a plurality of near-memory computing modules. The plurality of near-memory computing modules are connected to the processor in pairs (that is, each of the plurality of near-memory computing modules is connected to the processor), and the plurality of near-memory computing modules are connected to each other in pairs. The processor is configured to initiate a calculation task and decompose the calculation task into a plurality of ordered subtasks according to a network topology information table stored therein. The plurality of near-memory computing modules are each configured to implement different operation types, and the plurality of near-memory computing modules complete one or more of the plurality of subtasks according to the operation types they each implement.
In one embodiment, the computing module of the AI computing platform includes a near-memory computing module NCM which is an implementation manner of AI chip. In other embodiments, the computing module of the AI computing platform may also include computing modules such as GPGPU and FPGA, which are suitable for more complex computing applications such as machine training.
As shown in
The network topology information table is stored in the processor {circle around (0)} in
In one embodiment, the processor decomposes the calculation task into a plurality of ordered subtasks according to the network topology information (i.e., while decomposing the calculation task into a plurality of subtasks, it also generates a processing order of each subtask), generates a package, and transmits the package to a near-memory computing module that processes the first subtask according to the processing order of the subtasks. When the near-memory computing module receives the package, it processes the corresponding subtask to generate the processed package, and transmits the processed package to the near-memory computing module that processes the next subtask, and so on. When all the subtasks are completed, the near-memory computing module that processes the last subtask returns the processed package to the processor. As shown in
As shown in
In one embodiment, the processor does not specifically involve in the processing of subtasks, but only the near-memory computing modules process the subtasks. In another embodiment, the processor may also involve in the processing of subtasks after initiating the calculation task, i.e., the processor and the near-memory computing modules together process the subtasks. In this case, the processor may be configured to implement different operation type(s) from the plurality of near-memory computing modules. Wherein, the processor and several near-memory computing modules complete one or more of the plurality of subtasks according to the operation types they each implement. It should be noted that the use of the processor for implementing the operation type is optional.
Another embodiment of the present application discloses an AI cloud computing system, as shown in
In the embodiment of the present application, all the computing resources of the cloud computing system composed of the scheduled computing module in the application are maintained as a network topology information table in the form of “edge” (point-to-point connection), which is used when the processor initiates the calculation task to decompose the calculation task into subtasks.
Another embodiment of the present application discloses an AI computing method, and
In step 502, a processor initiates a calculation task and decomposes the calculation task into a plurality of ordered subtasks according to a network topology information table stored therein. The network topology information table includes: an IP address of a host where the processor is located, a near-memory computing module index, operation type(s) supported by the near-memory computing module, a load rate, number of adjacent near-memory computing modules, and operation types supported by the adjacent near-memory computing modules.
In step 504, the processor generates a package according to the decomposed plurality of ordered subtasks and routes the package to a near-memory computing module that processes the first subtask according to the processing order of the subtasks.
The plurality of near-memory computing modules are connected to the processor in pairs (that is, each of the plurality of near-memory computing modules is connected to the processor), and the plurality of near-memory computing modules are connected to each other in pairs, wherein the plurality of near-memory computing modules are each configured to implement different operation types.
In one embodiment, a package may include number of near-memory computing modules required to process the package, a list of near-memory computing modules required to process the package, a packet header cyclic redundancy check, a payload length, and a payload. In another embodiment, the package may further include a package number. In an embodiment, the list of near-memory computing modules includes an IP address of a host located, an index, operation type(s) supported, and a load rate of the near-memory computing module that sequentially processes each subtask.
In step 506, the near-memory computing module processes a corresponding subtask when receiving the package to generate a processed package, and routes the processed package to the near-memory computing module that processes the next subtask until all the subtasks are completed, and then routes to a processor connected to the near-memory computing module that processes the last subtask.
In addition, the processor may be further configured to implement operation type(s) different from the plurality of near-memory computing modules, wherein the processor and the plurality of near-memory computing modules complete one or more of the plurality of subtasks according to the operation types they each implement. That is, the processor can also participate in the processing of subtasks like the near-memory computing modules.
It should be noted that the package mentioned in this application is defined based on the near-memory segment calculation protocol (SC4NCM), that is, it is SC4NCM package.
In an example of a stand-alone multi-card environment,
Similarly, the protocol can also be used in multi-server or cloud server scenarios, with good versatility and scalability.
It should be noted that in this specification of the application, relational terms such as the first and second, and so on are only configured to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the term “comprises” or “comprising” or “includes” or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a multiple elements includes not only those elements but also other elements, or elements that are inherent to such a process, method, item, or device. Without more restrictions, the element defined by the phrase “comprise(s) a/an” does not exclude that there are other identical elements in the process, method, item or device that includes the element. In this specification of the application, if it is mentioned that an action is performed according to an element, it means the meaning of performing the action at least according to the element, and includes two cases: the action is performed only on the basis of the element, and the action is performed based on the element and other elements. Multiple, repeatedly, various, etc., expressions include 2, twice, 2 types, and 2 or more, twice or more, and 2 types or more types.
All documents mentioned in this specification are considered to be included in the disclosure of this application as a whole, so that they can be used as a basis for modification when necessary. In addition, it should be understood that the above descriptions are only preferred embodiments of this specification, and are not intended to limit the protection scope of this specification. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of one or more embodiments of this specification should be included in the protection scope of one or more embodiments of this specification.
In some cases, the actions or steps described in the claims can be performed in a different order than in the embodiments and still achieve desired results. In addition, the processes depicted in the drawings do not necessarily require the specific order or sequential order shown in order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Number | Date | Country | Kind |
---|---|---|---|
202210349300.5 | Apr 2022 | CN | national |