This application relates to the high-performance computing field, and in particular, to a network adapter, a computing device, and a data acquisition method.
Currently, data is exchanged in communication between a source process of a sending node and a destination process of a receiving node mainly based on a message passing interface (MPI). A send message generated by the sending node includes a process identifier and a tag of the destination process. A receive message generated by the receiving node includes a process identifier and a tag of the source process. The tags indicate data transmitted between the source process and the destination process.
Usually, in a process in which a processor of the receiving node runs the destination process, a tag generated by the processor is compared with tags stored in a main memory of the receiving node one by one, so that the processor acquires data that is of the source process and that is transmitted by the sending node. As a result, the processor of the receiving node consumes a large amount of computing resources to perform tag matching. This reduces utilization of the computing resources of the processor.
This application provides a network adapter, a computing device, and a data acquisition method, to unload a tag matching operation performed by a processor for data acquisition to another chip for processing, and release computing resources of the processor, thereby effectively improving utilization of the computing resources of the processor.
According to a first aspect, this application provides a network adapter, where the network adapter includes a first processor and a memory. The memory stores a computer-readable program and first information, where the first information indicates a tag that is sent by a sending node and fails in tag matching performed by the network adapter. The first processor is configured to execute the computer-readable program in the memory, to enable the network adapter to perform the following operations: After receiving a first tag acquired from a second processor, the network adapter performs tag matching based on the first tag and the tag indicated by the first information. If the tag matching succeeds, it indicates that the first information includes the first tag, and first data is sent to the second processor, that is, the first data corresponding to the first tag is sent to the second processor. The first tag indicates a first send message sent by the sending node, and the first send message includes the first data or information about the first data. A receiving node includes the network adapter and the second processor. The first information includes the tag that is sent by the sending node and that fails in tag matching performed by the network adapter.
In this way, compared with a tag matching operation performed by the second processor of the receiving node, in a method provided in embodiments of this application, the tag matching operation is unloaded to the network adapter, and the network adapter performs the tag matching operation, so that computing resources of the second processor in the receiving node are released, and the computing resources of the second processor in the receiving node can process another task, thereby improving utilization of the computing resources of the second processor in the receiving node.
The second processor may be a central processing unit (CPU), including one or more CPU cores. In addition, the second processor may alternatively be an application-specific integrated circuit (ASIC), or may be configured as one or more integrated circuits, for example, one or more microprocessors (uP), or one or more field programmable gate arrays (FPGA). The second processor may execute various functions of the receiving node by running or executing a software program stored in a main memory included in the receiving node and invoking data stored in the main memory.
In addition, the memory further stores second information, and the second information includes a tag that is acquired from the second processor and that fails in tag matching performed by the network adapter. The operations performed by the network adapter further include: If the tag matching fails, the first tag is stored in a storage space for storing the second information in the memory. In this way, the network adapter stores the first tag in the storage space for storing the second information in the memory of the network adapter, thereby avoiding data exchanges between the second processor of the receiving node and the network adapter, and reducing a data acquisition delay.
The second processor is connected to the network adapter through a bus, the network adapter receives, through the bus, the first tag sent by the second processor, and the network adapter sends the first data to the second processor through the bus. The bus may be an industry standard architecture (ISA) bus, a peripheral component interconnect (PCI) bus, a high-speed serial computer extended (PCIe) bus, an extended industry standard architecture (EISA) bus, or the like. The bus may be classified into an address bus, a data bus, a control bus, or the like.
The first send message is transmitted, through a network, by a source process run by the sending node to a destination process run by the receiving node. For example, the sending node and the receiving node transmit data to each other through the network using an interconnection technology. The interconnection technology may be, for example, an infiniband (IB) technology, a remote direct memory access over converged Ethernet (RoCE), or a transmission control protocol (TCP). The sending node and the receiving node may belong to a same cluster or belong to different clusters, which is not limited. The cluster may be a high-performance computing (HPC) cluster. The receiving node and the sending node communicate with each other through an MPI.
In addition, if the tag matching succeeds, the operations performed by the network adapter further include: The first data is acquired from the first information based on the first tag, where the first information includes the first data; or the first data is acquired based on the information about the first data associated with the first tag, where the information about the first data indicates information about an address at which the first data is stored.
For example, that the first data is acquired based on the address associated with the first tag includes: The first data is acquired from a storage space that is in a memory and that is indicated by the address associated with the first tag.
For another example, that the first data is acquired based on the address associated with the first tag includes: The first data is acquired from a storage space that is in the sending node and that is indicated by the address associated with the first tag.
In this way, the network adapter acquires the first data, and then sends the first data to the second processor. To prevent the second processor from notifying the network adapter of acquiring the first data after the tag matching succeeds, the network adapter transmits the acquired first data to the second processor. A quantity of interactions between the second processor of the receiving node and the network adapter is reduced, and the data acquisition delay is reduced.
Optionally, the first information further includes a first identifier, and the first identifier indicates that the first send message includes the first data. The network adapter may determine, based on the first identifier, that the first send message includes the first data. The first information may further include a second identifier, and the second identifier indicates the receiving node to acquire the first data from the sending node. The network adapter may determine, based on the second identifier, that the first send message does not include the first data, and acquire the first data from the sending node. In this way, the network adapter determines a manner of acquiring the first data, that is, the network adapter acquires the first data from the receiving node or acquires the first data from the sending node.
Optionally, after acquiring the first data, the network adapter deletes the first tag in the first information. In this way, the tag occupies less storage space of the memory of the network adapter, and storage efficiency of the memory of the network adapter is improved.
In another possible implementation, the memory further stores the second information, and the second information includes the tag that is acquired from the second processor and that fails in tag matching performed by the network adapter. The operations performed by the network adapter further include: Tag matching is performed based on a second tag and the tag included in the second information. If the tag matching fails, the second tag is stored in a storage space for storing the first information in the memory. The second tag indicates a second send message sent by the sending node, the second send message includes second data or information about the second data, and the second tag is acquired from the second send message that is sent by the sending node and received by the network adapter through the network. In this way, the network adapter stores the second tag in the storage space for storing the first information in the memory of the network adapter, thereby avoiding data exchanges between the second processor of the receiving node and the network adapter, and reducing the data acquisition delay.
In addition, if the tag matching succeeds, the operations performed by the network adapter further include: The second data included in the second send message is sent to the second processor; or the second data is acquired from the sending node through the network based on the information about the second data, and the second data is sent to the second processor, where the information about the second data indicates a location at which the sending node stores the second data.
Optionally, after acquiring the second data, the network adapter deletes the second tag in the second information. In this way, the tag occupies less storage space of the memory of the network adapter, and storage efficiency of the memory of the network adapter is improved.
Optionally, the network adapter controls an execution of a tag writing operation on the first information and the second information. Specifically, the network adapter forbids the execution of a tag writing operation on the first information included in the memory of the network adapter, to avoid abnormal tag matching success or matching failure in a tag matching operation due to a case that the network adapter receives another tag and performs a tag writing operation on the first information during this tag matching, thereby improving reliability of the tag matching. In addition, a tag writing operation is forbidden to be executed on the second information included in the memory of the network adapter, to avoid abnormal tag matching success or matching failure in a tag matching operation due to a case that the network adapter receives another tag and performs a tag writing operation on the second information during this tag matching, thereby improving reliability of the tag matching.
According to a second aspect, this application provides a computing device. The computing device includes the second processor and the network adapter according to the first aspect or any possible implementation of the first aspect. The second processor and the network adapter are connected through a bus, and the second processor and the network adapter transmit a tag and data through the bus.
According to a third aspect, this application provides a data acquisition method. The method is performed by a network adapter. The network adapter includes a first processor and a memory. The method includes: Tag matching is performed based on the first tag and the tag indicated by the first information. If the tag matching succeeds, the first data corresponding to the first tag is sent to the second processor. The first tag indicates a first send message sent by a sending node, the first send message includes first data or information about the first data, the first tag is acquired from a second processor in a receiving node, the receiving node further includes the network adapter, the first information includes a tag that is sent by the sending node and that fails in tag matching performed by the network adapter, and the receiving node and the sending node communicate with each other through an MPI.
In a possible implementation, the memory further stores the second information, and the second information includes the tag that is acquired from the second processor and that fails in tag matching performed by the network adapter. The method further includes: If the tag matching fails, the first tag is stored in a storage space for storing the second information in the memory.
In another possible implementation, the method further includes: The memory further stores the second information, and the second information includes the tag that is acquired from the second processor and that fails in tag matching performed by the network adapter. The method further includes: Tag matching is performed based on a second tag and the tag included in the second information. If the tag matching fails, the second tag is stored in a storage space for storing the first information in the memory. The second tag indicates a second send message sent by the sending node, the second send message includes second data or information about the second data, and the second tag is acquired from the second send message that is sent by the sending node and received by the network adapter through the network.
In another possible implementation, the method further includes: If the tag matching succeeds, the method further includes: The second data included in the second send message is sent to the second processor; or the second data is acquired from the sending node through the network based on the information about the second data, and the second data is sent to the second processor, where the information about the second data indicates information about an address at which the sending node stores the second data.
According to a fourth aspect, this application provides a computer-readable storage medium, including computer software instructions. When the computer software instructions are run on a computing device, the computing device is enabled to perform the operation steps according to the first aspect or any possible implementation of the first aspect.
According to a fifth aspect, this application provides a computer program product. When the computer program product is run on a computing device, the computing device is enabled to perform the operation steps according to the first aspect or any possible implementation of the first aspect.
In this application, the implementations according to the foregoing aspects may be further combined to provide more implementations.
An HPC cluster refers to a computer cluster system. The HPC cluster includes a plurality of computers that are connected together using various interconnection technologies. The interconnection technologies may be, for example, IB, a RoCE, or a TCP. The HPC provides an ultra-high floating-point computing capability to meet computing requirements for intensive and massive data computing processing. The plurality of computers connected together have a comprehensive computing capability to resolve large-scale computing problems. For example, the HPC cluster is used to resolve large-scale computing problems and computing requirements related to industries such as scientific researches, weather forecast, simulation experiments, biopharmacy, military researches, gene sequencing, and image processing. When the HPC cluster is used to resolve large-scale computing problems, computing time of data processing can be effectively shortened, and computing precision is improved. Usually, a management node in the HPC cluster may decompose a computing task, and allocate decomposed computing tasks to a plurality of computing nodes, and the plurality of computing nodes complete the computing tasks in parallel.
Currently, an MPI is a common used parallel communication protocol for communication between the computing nodes in the HPC cluster. It may be understood that, processes of the computing nodes exchange data through the MPI. In embodiments of this application, a computing node for sending data may be referred to as a sending node. A computing node for receiving data may be referred to as a receiving node.
It should be noted that, the sending node and the receiving node may run a plurality of processes. For example, the sending node 111 runs a process 1111 and a process 1112. For another example, the receiving node 122 runs a process 1221 and a process 1222. The plurality of processes run by the sending node and the receiving node may be different application processes or a same application process, which is not limited. The process of the sending node and the process of the receiving node may communicate with each other based on a point-to-point communication manner or a multi-point communication manner. The point-to-point communication manner refers to a manner in which one source process communicates with one destination process. The source process is configured to send data. The destination process is configured to receive data. The multi-point communication manner includes a one-to-many communication manner, a many-to-one communication manner, and a many-to-many communication manner. The one-to-many communication manner refers to a manner in which one source process communicates with a plurality of destination processes. The many-to-one communication manner refers to a manner in which a plurality of source processes communicate with one destination process. The many-to-many communication manner refers to a manner in which a plurality of source processes communicate with a plurality of destination processes. For example, the process of the sending node and the process of the receiving node may interact data through the MPI.
When the sending node runs, in a process of running the source process, an MPI send function, the sending node may generate a send message based on the MPI_SEND function, and send the send message to the receiving node. The send message may include data, or the send message does not include data but includes an address of a storage space for storing data in the sending node. The send message may also be referred to as an MPI_SEND message. The MPI_SEND function is pre-written in a program of the source process.
A format of the MPI_SEND function may be MPI_SEND(buf,count,datatype,dest,tag,comm), where buf indicates an initial address of a storage location of the data, in the sending node, transmitted by the sending node using the send message; datatype indicates data types of the data transmitted through the send message, and the data types include an integer type, a real type, and a character type; count indicates a quantity of the data types of the data transmitted through the send message; dest indicates a process identifier of the destination process in the receiving node, and the destination process is a process of receiving the data transmitted through the send message; tag indicates the send message, and tag is used to distinguish different send messages interacted between a same source process and a same destination process; and comm indicates a process group identifier of a process group to which the destination process belongs, the process group may also be referred to as a communicator, and the process group identifier and the process identifier may uniquely identify a process.
The MPI_SEND function indicates the sending node to send count datatypes of data in a buffer to a destination process whose process identifier is dest.
The MPI_SEND message includes an envelope part and a data part. The envelope part includes dest, tag, and comm. The data part includes buf, count, and datatype.
When the receiving node runs, in a process of running the destination process, an MPI receive function, the receiving node acquires the data that is of the source process and that is transmitted by the sending node. The receiving node may generate a receive message based on the MPI_RECV function. The receive message includes a tag that indicates the data of the source process in the sending node, so that the receiving node acquires, based on the tag, the data that is of the source process and that is transmitted by the sending node. The receive message may also be referred to as an MPI_RECV message. The MPI_RECV function is pre-written in a program of the destination process.
A format of the MPI_RECV function may be MPI_RECV(buf,count,datatype,source,tag,comm,status), where buf indicates an initial address at which the receiving node stores the acquired data transmitted through the send message; datatype indicates data types of the data acquired through the receive message, and the data types include an integer type, a real type, and a character type; count indicates a quantity of the data types of the data acquired through the receive message; source indicates a process identifier of the source process in the sending node, and the source process is a process for sending data; tag is used to distinguish different send messages interacted between a same source process and a same destination process; comm indicates a process group identifier of a process group to which the source process belongs; and status indicates whether the receive message is received correctly or incorrectly.
The MPI_RECV function indicates the receiving node to receive no more than count datatypes of data from a source process whose process identifier is source, where an identifier of the data is tag, and store the data in a storage space whose initial address is buf.
The MPI_RECV message includes an envelope part and a data part. The envelope part includes source, tag, and comm. The data part includes buf, count, and datatype.
After generating the data, the sending node may encapsulate the send message to generate a data packet suitable for network transmission. The sending node may transmit data to the receiving node through a network 130. The network 130 may be an interconnection network. The interconnection network may include at least one network device (for example, a network device 131 and a network device 132). In this embodiment of this application, the network device may be a router, a switch, a load balancer, a dedicated firewall, or the like.
The sending node and the receiving node may be connected to the network device in a wireless or wired manner.
However, in a process in which the source process of the sending node and the destination process of the receiving node exchange data through the MPI, a moment at which the receiving node receives the data of the source process may be different from a moment at which the receiving node acquires the data of the source process. Therefore, in a case that the receiving node needs to acquire the data of the source process, the receiving node determines whether the data transmitted by the sending node has been received. For example, before the receiving node acquires the data of the source process, the receiving node has received the data. For another example, before the receiving node acquires the data of the source process, the receiving node has not received the data.
Therefore, a processor of the receiving node matches the tag in the receive message with the tag in the send message. Specifically, the processor of the receiving node compares the tag included in the MPI_RECV function with tags stored in a main memory of the receiving node one by one. If the main memory of the receiving node does not store the tag included in the MPI_RECV function, it indicates that the receiving node has not received the data transmitted by the sending node; or if the main memory of the receiving node stores the tag included in the MPI_RECV function, it indicates that the receiving node has received the data transmitted by the sending node. In this way, the receiving node determines, by tag matching, whether the data transmitted by the sending node has been received. Further, the receiving node acquires the data from the receiving node based on the tag, or the receiving node determines, based on the tag, the address of the storage space for storing data in the sending node, and acquires data from the sending node based on the address. In a process in which the processor of the receiving node acquires the data, the tag included in the MPI_RECV function is compared with the tags stored in the main memory of the receiving node one by one. As a result, the processor of the receiving node consumes a large amount of computing resources to perform tag matching, reducing utilization of the computing resources of the processor.
For example,
As shown in (a) in
After generating the first receive message, the processor of the receiving node compares the first tag in the first receive message with the tags included in the first information one by one. Before that, if the first tag is stored in the storage space for storing the first information in the main memory, the tag matching succeeds, and the first data of the first send message is acquired based on the first tag.
In a case, if the sending node sends the first data in a first mode, the send message includes the first data, and the processor of the receiving node may acquire the first data from the main memory of the receiving node or from the network adapter of the receiving node. For example, the first mode may be an eager mode.
In another case, if the sending node sends the first data in a second mode, the send message does not include the first data, but includes an address of a storage space for storing the first data in the sending node. In this case, the processor of the receiving node notifies the network adapter, and the network adapter acquires the first data from the sending node based on the address in the send message. The second mode may be, for example, a rendezvous mode. After acquiring the first data, the network adapter may send the first data to the processor through a bus for connecting the network adapter to the processor.
As shown in (b) in
The network adapter of the receiving node receives the second send message sent by the sending node. The network adapter of the receiving node compares the second tag in the second send message with the tags included in the second information one by one. Before that, if the processor of the receiving node has generated the second receive message, that is, the processor needs to acquire second data, and the second tag is stored in the storage space for storing the second information in the memory, the tag matching succeeds, and the network adapter acquires the second data based on the second tag. The network adapter acquires the second data from the memory, or the network adapter acquires the second data from the sending node based on an address included in the second send message, and sends the second data to the processor.
It can be learned that, in a manner in which the processor of the receiving node acquires data, the processor of the receiving node needs to consume a large amount of computing resources to perform tag matching, and the processor of the receiving node performs a plurality of data interactions with the network adapter. Particularly, as shown in (a) in
According to a data acquisition method provided in embodiments of this application, the first information stored in the main memory may be unloaded to the network adapter, and the memory in the network adapter stores the first information and the second information. After generating the receive message, the processor of the receiving node transmits the tag in the receive message to the network adapter. The network adapter compares the tag in the receive message with the tags included in the first information one by one. If the tag matching succeeds, the data transmitted through the send message is acquired based on the tag, and the data is transmitted to the processor. If the tag matching fails, the tag is stored in the storage space for storing the second information in the memory. In this way, the computing resources of the processor in the receiving node are released, and the computing resources of the processor in the receiving node can process another task, thereby improving utilization of the computing resources of the processor in the receiving node. In addition, the processor of the receiving node does not perform a plurality of data interactions with the network adapter, thereby reducing the data acquisition delay.
The following describes in detail a data acquisition process provided in embodiments of this application with reference to the accompanying drawings.
A data acquisition function of a receiving node may be implemented based on computing resources (for example, a processor) and storage resources (for example, a cache, a memory, or another storage medium) of the node. The computing resources and the storage resources of the node may be virtual resources allocated by a cloud data center, or may be entity physical resources. The following describes possible composition of the computing resources and the storage resources of the node by using an example.
The following specifically describes each component of the computing device 300 with reference to
The processor 301 is a control center of the computing device. Usually, the processor 301 is a CPU, and includes one or more CPU cores, for example, a CPU 0 and a CPU 1 shown in
The main memory 302 is configured to store a related software program for executing the solutions of this application, and the processor 301 controls an execution of the software program.
In a physical form, the main memory 302 may be a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a random-access memory (RAM) or another type of dynamic storage device that can store information and instructions, or may be an electrically erasable programmable read-only memory (EEPROM), but is not limited thereto. The main memory 302 may exist independently, and is connected to the processor 301 through the communication bus 304. The main memory 302 may alternatively be integrated with the processor 301. This is not limited.
In this embodiment of this application, first information stored in the main memory 302 is unloaded to the network adapter 303, and the memory 3032 in the network adapter 303 stores the first information and second information. After generating a receive message, the processor 301 transmits a tag in the receive message to the network adapter 303. The processor 3031 in the network adapter 303 compares the tag in the receive message with tags included in the first information one by one. If the tag matching succeeds, data transmitted through a send message is acquired based on the tag, and the data is transmitted to the processor 301. If the tag matching fails, the tag is stored in a storage space for storing the second information in the memory 3032. In this way, computing resources of the processor 301 are released, and the computing resources of the processor 301 can process another task, thereby improving utilization of the computing resources of the processor 301. In addition, the processor 301 does not perform a plurality of data interactions with the network adapter 303, thereby reducing a data acquisition delay.
The communication bus 304 may be an ISA bus, a PCI bus, a PCIe bus, an EISA bus, or the like. The bus may be classified into an address bus, a data bus, a control bus, or the like. For ease of description, the bus in
The device structure shown in
With reference to
S401: The first processor generates a first receive message.
When the first processor runs, in a process of running the second process, a first MPI_RECV function in the second process, the first receive message is generated based on the first MPI_RECV function. The first receive message includes a first initial address, a first value, a first data type, a first process identifier, a first tag, a first process group identifier, and a status value. The first initial address indicates a location at which the receiving node stores the first data. The first data type indicates a data type of data acquired through the first receive message. The data types include an integer type, a real type, and a character type. The first value indicates a quantity of data types of the data acquired through the first receive message. The first process identifier indicates the first process, and the first process is a process for the sending node to send data. The first tag indicates a first send message sent by the sending node, and the first send message includes the first data or information about the first data. The information about the first data may be an address at which the sending node stores the first data. It may be understood that, the first tag may alternatively refer to the first data that needs to be acquired by the first receive message. The first process group identifier indicates a process group identifier of a process group to which the first process sending the first send message belongs. The first process group identifier and the first process identifier uniquely identify the first process. The status value is used to notify the sending node of whether the first receive message is correctly received or not.
S402: The first processor sends the first receive message to the network adapter.
The first processor may send a complete first receive message to the network adapter, or the first processor may send partial information in the first receive message to the network adapter.
For example, the first processor may send the first tag in the first receive message to the network adapter, so that the second processor of the network adapter compares the first tag with the tags included in the first information. The first tag may be a tag defined in the first MPI_RECV function, or the first tag may be a tag acquired after a hash operation is performed on a process identifier, a tag, and a process group identifier that are defined in the first MPI_RECV function. In this way, the tag acquired after the hash operation can improve security of the tag and avoid leakage of information such as the tag.
For another example, the first processor may send the first process identifier, the first tag, and the first process group identifier in the first receive message to the network adapter. The first process identifier, the first tag, and the first process group identifier in the first receive message may be the process identifier, the tag, and the process group identifier defined in the first MPI_RECV function.
S403: The second processor of the network adapter performs tag matching based on the first tag and the tags indicated by the first information.
After the network adapter receives the first tag transmitted by the first processor, the second processor of the network adapter compares the first tag with the tags included in the first information one by one, to determine whether the first information includes a tag the same as the first tag.
If the tag matching succeeds, it indicates that the first tag is stored in a storage space for storing the first information in the memory, and the network adapter has received the first send message sent by the sending node, that is, the network adapter has received the first data from the sending node. The second processor of the network adapter performs S404 to S406.
If the tag matching fails, it indicates that the first tag is not stored in the storage space for storing the first information in the memory, and the network adapter has not received the first data from the sending node. The second processor of the network adapter performs S407.
In some embodiments, the tags included in the first information may be tags that are defined in a first MPI_SEND function and sent by the sending node. The first tag received by the network adapter from the first processor may be the tag defined in the first MPI_RECV function. After the network adapter receives the first tag from the first processor, the second processor of the network adapter may compare the first tag with the tags included in the first information.
In some other embodiments, the tags included in the first information may be tags acquired after the sending node performs a hash operation based on a process identifier, a tag, and a process group identifier that are defined in the first MPI_SEND function. The first processor may acquire the first tag after a hash operation is performed on the process identifier, the tag, and the process group identifier that are defined in the first MPI_RECV function. After the network adapter receives the first tag from the first processor, the second processor of the network adapter may compare the first tag with the tags included in the first information. Alternatively, the network adapter receives the process identifier, the tag, and the process group identifier that are defined in the first MPI_RECV function from the first processor, and the second processor of the network adapter compares the first tag with the tags included in the first information, where the first tag is acquired after a hash operation is performed on the process identifier, the tag, and the process group identifier that are defined in the first MPI_RECV function.
Optionally, when the sending node and the receiving node perform a hash operation on the tag, the process identifier may be a process identifier of a source process or a process identifier of a destination process, which is not limited.
It should be noted that, before the second processor of the network adapter compares the first tag with the tags included in the first information, the network adapter controls an execution of a tag writing operation on the first information and the second information. Specifically, the second processor of the network adapter may forbid the execution of a tag writing operation on the first information included in the memory of the network adapter, to avoid abnormal tag matching success or matching failure in a tag matching operation due to a case that the network adapter receives another tag and performs a tag writing operation on the first information when the network adapter performs a tag matching operation, thereby improving reliability of the tag matching. In addition, a tag writing operation is forbidden to be executed on the second information included in the memory of the network adapter, to avoid abnormal tag matching success or matching failure in a tag matching operation due to a case that the network adapter receives another tag and performs a tag writing operation on the second information when the network adapter performs a tag matching operation, thereby improving reliability of the tag matching.
Compared with a tag matching operation performed by the first processor (such as the processor 301) of the receiving node, in a method provided in this embodiment of this application, the tag matching operation is unloaded to the network adapter, and the network adapter performs the tag matching operation, so that computing resources of the first processor in the receiving node are released, and the computing resources of the first processor in the receiving node can process another task, thereby improving utilization of the computing resources of the first processor in the receiving node. In addition, the first processor of the receiving node does not perform a plurality of data exchanges with the network adapter, thereby reducing the data acquisition delay.
S404: The second processor of the network adapter acquires the first data based on the first tag.
In a possible implementation, if a data amount of the first data is small, the sending node sends the first send message in a first mode, where the first send message includes the first data. After receiving the first send message, the network adapter may store the first data included in the first send message in the memory of the network adapter, and the second processor of the network adapter may locally acquire the first data. As shown in
For example, the first information includes a correspondence between the first tag and the first send message. The second processor of the network adapter performs S4041, that is, the second processor of the network adapter acquires the first data from the first information based on the first tag.
For another example, the first information includes a correspondence between the first tag and the address associated with the first tag, and the address associated with the first tag indicates the storage space in the memory of the network adapter. The second processor of the network adapter performs S4042 and S4043, that is, the second processor of the network adapter acquires the address associated with the first tag from the first information based on the first tag, and acquires the first data based on a location that is in the memory and that is indicated by the address associated with the first tag.
Optionally, the first send message may further include a first identifier, and the first identifier indicates the sending node to send data in the first mode, that is, the first send message includes the first data. The first information may include a correspondence between the first tag and the first identifier. The second processor of the network adapter may acquire the first identifier based on the first tag, and determine, based on the first identifier, that the sending node sends the first data in the first mode, and the second processor of the network adapter locally acquires the first data.
In another possible implementation, if the data amount of the first data is large, the sending node sends the first send message in a second mode, where the first send message does not include the first data. The second processor of the network adapter may acquire the first data from the sending node. As shown in
For example, the first information includes the correspondence between the first tag and the first send message, and the second processor of the network adapter may acquire the first send message from the first information based on the first tag, and acquire the first data from the sending node based on the second initial address included in the first send message.
For another example, the first information includes the correspondence between the first tag and the address associated with the first tag, and the address associated with the first tag indicates the storage space in the memory of the network adapter. The second processor of the network adapter may acquire the address associated with the first tag from the first information based on the first tag, and acquire the first send message based on the location that is in the memory of the network adapter and that is indicated by the address associated with the first tag. The second processor of the network adapter acquires the first data from the sending node based on the second initial address included in the first send message.
For another example, the first information includes a correspondence between the first tag and the second initial address included in the first send message, and the second processor of the network adapter acquires, from the first information based on the first tag, the second initial address included in the first send message, and acquires the first data from the sending node based on the second initial address.
Optionally, the first send message may further include a second identifier, and the second identifier indicates the sending node to send data in the second mode, that is, the first send message does not include the first data. The first information may include a correspondence between the first tag and the second identifier. The second processor of the network adapter may acquire the second identifier based on the first tag, and determine, based on the second identifier, that the sending node sends the first data in the second mode, and the second processor of the network adapter may acquire the first data from the sending node. It may be understood that, the second identifier indicates the receiving node to acquire the first data from the sending node.
S405: The second processor of the network adapter sends the first data to the first processor.
S406: The second processor of the network adapter deletes the first tag in the first information.
In this way, the tag sent by the sending node occupies less storage space of the memory of the network adapter, and storage efficiency of the memory of the network adapter is improved.
S407: The second processor of the network adapter stores the first tag in a storage space for storing the second information in the memory.
In this way, after receiving the first send message from the sending node, the network adapter matches the tag included in the first send message with the first tag in the second information, to acquire the first data.
A same pair of source process and destination process use a same tag to indicate a send message including data when performing data interactions. Therefore, the first tag may be generated by the receiving node or may be sent by the sending node. The foregoing S401 to S407 describe a process in which the first processor of the receiving node performs tag matching after generating the first tag. Further, after the network adapter of the receiving node receives the send message sent by the sending node, the second processor performs tag matching on the first tag included in the send message, so that the network adapter processes the data included in the send message. As shown in
S408: The sending node generates the first send message.
When the sending node runs, in a process of running the first process, the first MPI_SEND function, the first send message is generated based on the first MPI_SEND function. The first send message includes the second initial address, a second value, a second data type, a second process identifier, the first tag, and a second process group identifier. The second initial address indicates a location at which the sending node stores the first data. The second data type indicates a data type of the data acquired through the first send message. The second value indicates a quantity of data types of the data acquired through the first send message. The second process identifier indicates the second process, and the second process is a process for the receiving node to receive data. The first tag indicates the first send message sent by the sending node. The second process group identifier indicates a process group identifier of a process group to which the second process generating the first receive message belongs. The second process group identifier and the second process identifier uniquely identify the second process. The first process group identifier and the second process group identifier may be the same or may be different. If the first process group identifier is the same as the second process group identifier, it indicates that the first process and the second process belong to a same process group. If the first process group identifier is different from the second process group identifier, it indicates that the first process and the second process belong to different process groups.
S409: The sending node sends the first send message to the network adapter.
After generating the first send message, the sending node may encapsulate the first send message to generate a data packet suitable for network transmission. The sending node may transmit the data packet to the receiving node through a network.
For example,
When a value of the operator field is 0, it indicates that the receiving node does not need to perform tag matching. When a value of the operator field is 1, it indicates that the sending node sends data in the second mode. When a value of the operator field is 2, it indicates that the sending node ends sending data in the second mode. When a value of the operator field is 3, it indicates that the sending node sends data in the first mode. The tag field may include the tag defined in the first MPI_SEND function, or may include the tag acquired after a hash operation is performed on the process identifier, the tag, and the process group identifier defined in the first MPI_SEND function. The reserved field may be an optional field.
It should be noted that, the sending node may compare a length value of the first data with a preset threshold. If the length value of the first data is greater than or equal to the preset threshold, the sending node sends the first data in the second mode, and the data part does not include the first data. If the length value of the first data is less than the preset threshold, the sending node sends the first data in the first mode, and the data part includes the first data. The preset threshold may be preconfigured based on a service requirement.
S410: The network adapter receives the first send message sent by the sending node.
S411: The second processor of the network adapter performs tag matching based on the first tag and the tags indicated by the second information.
After the network adapter receives the first tag from the sending node, the second processor of the network adapter compares the first tag with the tags included in the second information one by one, to determine whether the second information includes a tag the same as the first tag.
If the tag matching succeeds, it indicates that the first tag is stored in the storage space for storing the second information in the memory, and the second processor of the network adapter acquires the first data associated with the first tag. Specifically, the second processor of the network adapter performs S412 and S415, or performs S413, S414, and S415.
If the tag matching fails, it indicates that the first tag is not stored in the storage space for storing the second information in the memory, and the second processor of the network adapter cannot acquire the first data associated with the first tag, and waits for the network adapter to receive the first data sent by the sending node. Further, the second processor of the network adapter performs S416.
Compared with a tag matching operation performed by the first processor (such as the processor 301) of the receiving node, in a method provided in this embodiment of this application, the tag matching operation is unloaded to the network adapter, and the network adapter performs the tag matching operation, so that computing resources of the first processor in the receiving node are released, and the computing resources of the first processor in the receiving node can process another task, thereby improving utilization of the computing resources of the first processor in the receiving node. In addition, the first processor of the receiving node does not perform a plurality of data exchanges with the network adapter, thereby reducing the data acquisition delay.
S412: The second processor of the network adapter sends the first data included in the first send message to the first processor.
S413: The second processor of the network adapter acquires the first data from the sending node based on the address included in the first send message.
Specifically, the second processor of the network adapter acquires the first data from the sending node based on the second initial address included in the first send message.
S414: The second processor of the network adapter sends the first data to the first processor.
The address included in the first send message indicates the location at which the sending node stores the first data.
S415: The second processor of the network adapter deletes the first tag in the second information.
In this way, the tag sent by the sending node occupies less storage space of the memory of the network adapter, and storage efficiency of the memory of the network adapter is improved.
S416: The second processor of the network adapter stores the first tag in the storage space for storing the first information in the memory.
For example, the sending node sends the first send message in the first mode, and the second processor of the network adapter stores the first tag and the first data that are included in the first send message in the storage space for storing the first information in the memory; or the second processor of the network adapter stores the first data included in the first send message in the memory of the network adapter, and stores the address at which the network adapter stores the first data in the storage space for storing the first information in the memory. For another example, the sending node sends the first send message in the second mode, and the second processor of the network adapter stores the first tag and the second initial address that are included in the first send message in the storage space for storing the first information in the memory.
After the second processor of the network adapter stores the first tag in the storage space for storing the first information in the memory, and the first processor generates the first receive message, the second processor of the network adapter compares the first tag included in the first receive message with the tags included in the first information, to acquire the first data. For details, refer to the descriptions of S401 to S407.
In some other embodiments, the memory of the network adapter stores the first information and the second information. A second send message received by the network adapter from the sending node includes a second tag. The second processor of the network adapter compares the second tag in the second send message with the tags included in the second information one by one. If the tag matching succeeds, second data of the second send message is acquired based on the second tag, and the second data is transmitted to the first processor. If the tag matching fails, the second tag is stored in the storage space for storing the first information in the memory.
After generating a second receive message, the first processor transmits the second tag in the second receive message to the network adapter. The second processor of the network adapter compares the second tag in the second receive message with the tags included in the first information one by one. If the tag matching succeeds, the second data of the second send message is acquired based on the second tag, and the second data is transmitted to the first processor. If the tag matching fails, the second tag is stored in the storage space for storing the second information in the memory. For a specific tag matching process, refer to the descriptions in the foregoing embodiment. Details are not described again.
The following describes a data acquisition process by using an example.
As shown in (a) in
After generating the first receive message, the processor 301 transmits the first tag in the first receive message to the network adapter 303, and the network adapter 303 compares the first tag in the first receive message with tags included in the first information one by one. Before that, if the first tag is stored in the storage space for storing the first information in the memory, the tag matching succeeds, the first data of the first send message is acquired based on the first tag, and the first data is transmitted to the processor 301.
As shown in (b) in
The network adapter 303 receives the second send message from the sending node. The network adapter 303 compares the second tag in the second send message with the tags included in the second information one by one. Before that, if the processor 301 has generated the second receive message, that is, the processor 301 needs to acquire second data, and the second tag is stored in the storage space for storing the second information in the memory, the tag matching succeeds, and the network adapter 303 acquires the second send message based on the second tag, acquires the second data from the sending node based on an address included in the second send message, and transmits the second data to the processor 301.
In this way, computing resources of the main processor (such as the processor 301) in the receiving node are released, and the computing resources of the main processor in the receiving node can process another task, thereby improving utilization of the computing resources of the main processor in the receiving node. In addition, especially when the network adapter of the receiving node first receives data, and then the main processor of the receiving node acquires the data, a quantity of data exchanges between the main processor of the receiving node and the network adapter is effectively reduced, and a data acquisition delay is reduced.
As shown in
The network adapter 900 includes a communication interface 910, a processor 920, and a memory 930. The communication interface 910 is configured to receive a send message sent by a sending node. The send message may include a tag and data. The processor 920 is configured to compare a tag in a receive message generated by the main processor with tags included in first information. If the tag matching succeeds, data corresponding to the tag is sent to the main processor through the bus. If the tag matching fails, the tag is stored in a storage space for storing second information in the memory 930. The processor 920 is further configured to compare the tag in the send message with tags included in the second information. If the tag matching fails, the tag is stored in a storage space for storing the first information in the memory 930. If the tag matching succeeds, data corresponding to the tag is sent to the main processor through the bus. The memory 930 is configured to store the first information and the second information. Specifically, the network adapter 900 is configured to implement functions of the network adapter in the receiving node in the method embodiment shown in
It should be understood that, the network adapter 900 in this embodiment of this application may be implemented by an ASIC or a programmable logic device (PLD). The PLD may be a complex program logic device (CPLD), an FPGA, a generic array logic (GAL), or any combination thereof. Alternatively, when the method shown in
For more detailed descriptions of the foregoing communication interface 910, processor 920, and memory 930, directly refer to related descriptions in the method embodiment shown in
The method steps in this embodiment may be implemented in a hardware manner, or may be implemented by executing software instructions by a processor. The software instructions include corresponding software modules. The software modules may be stored in a RAM, a flash memory, a ROM, a PROM, an EPROM, an EEPROM, a register, a hard disk, a removable hard disk, a CD-ROM, or a storage medium of any other form known in the art. For example, a storage medium is coupled to a processor, so that the processor can read information from the storage medium and write information into the storage medium. Certainly, the storage medium may alternatively be a component of the processor. The processor and the storage medium may be located in an ASIC. In addition, the ASIC may be located in a computing device. Certainly, the processor and the storage medium may alternatively exist in the computing device as discrete components.
All or some of the foregoing embodiments may be implemented by software, hardware, firmware, or any combination thereof. When software is used to implement the embodiments, all or some of the embodiments may be implemented in a form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer programs or instructions are loaded and executed on a computer, all or some of the procedures or functions described in embodiments of this application are executed. The computer may be a general-purpose computer, a dedicated computer, a computer network, a network device, user equipment, or another programmable apparatus. The computer programs or instructions may be stored in a computer-readable storage medium, or may be transmitted from a computer-readable storage medium to another computer-readable storage medium. For example, the computer programs or instructions may be transmitted from a website, computer, server, or data center to another website, computer, server, or data center in a wired or wireless manner. The computer-readable storage medium may be any usable medium accessible by the computer, or a data storage device, such as a server or a data center, integrating one or more usable media. The usable medium may be a magnetic medium, for example, a floppy disk, a hard disk, or a magnetic tape, may be an optical medium, for example, a digital video disc (DVD), or may be a semiconductor medium, for example, a solid-state drive (SSD).
The foregoing descriptions are merely specific implementations of this application, but are not intended to limit the protection scope of this application. Any modification or replacement readily figured out by a person skilled in the art within the technical scope disclosed in this application shall fall within the protection scope of this application. Therefore, the protection scope of this application shall be subject to the protection scope of the claims.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202110206628.7 | Feb 2021 | CN | national |
This application is a continuation of International Application No. PCT/CN2021/142612, filed on Dec. 29, 2021, which claims priority to Chinese Patent Application No. 202110206628.7, filed on Feb. 24, 2021. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties.
| Number | Date | Country | |
|---|---|---|---|
| Parent | PCT/CN2021/142612 | Dec 2021 | US |
| Child | 18453659 | US |