This application relates to the field of computer technologies, and in particular, to a method for processing data by using an intermediate device, a computer system, and an intermediate device.
Usually, a multi-node computer system includes a plurality of computing nodes and storage nodes. To achieve concurrent access by the plurality of computing nodes to storage spaces for applications, the storage nodes manage the storage spaces for the applications in related technologies. Because the storage node manages the storage space of the application, a load of the computing node and a data processing latency increase when the computing node accesses application data stored in the storage node. For example, when the computing node needs to write data of an application to a storage space corresponding to the application, the computing node first requests the storage node to allocate, for the data, a write address in the storage space corresponding to the application. After obtaining the write address, the computing node sends, to the storage node, a write request for writing the data to the write address. In this way, when writing the data, the computing node first needs to apply to the storage node for allocation of the write address. After the write address is allocated, the computing node sends, to the storage node, the write request for writing the data to the write address. That is, the computing node needs to send at least two requests to the storage node. This leads to increase in the load of the computing node and a data write latency.
Embodiments of this application provide a data processing method, a computer system, and an intermediate device. A storage space of an application is managed by an intermediate device connected to a computing node and a storage node, thereby effectively reducing a load of the computing node and reducing a data write latency.
To achieve the foregoing objectives, a first aspect of this application provides a computer system. The computer system includes a computing node, a storage node, and an intermediate device. A service runs in the computing node. The storage node stores data of the service. The service is provided with a storage space, and the service performs an operation on the data of the service in the storage node by accessing a storage space. For example, the storage space is a persistence LOG (PLOG) space. In this embodiment of this application, metadata of the storage space is stored in the intermediate device. The intermediate device manages the metadata of the storage space, and implements a data operation between the computing node and the storage node based on the metadata of the storage space.
The metadata of the storage space of the service is stored on the intermediate device, so that the intermediate device manages the storage space of the service based on the metadata. In this way, a load of the computing node and a latency in writing data can be reduced. For example, when the operation is a write operation, the computing node only needs to send the write request to the intermediate device. The intermediate device allocates an address in the storage space for the write request based on the metadata, and writes data to the storage device according to the allocated address. Because the computing node only needs to send the request to the intermediate device, a load of the computing node and a latency in writing data are reduced.
In an implementation of the first aspect, the metadata of the storage space includes metadata of a logical storage space, the metadata of the logical storage space includes an identifier of the logical storage space, address information of the logical storage space, and address information of a physical storage space corresponding to the logical storage space; and the physical storage space belongs to a space in the storage node.
By recording, in the metadata, the address information of the logical storage space of the application and the address information of the physical storage space in the storage node, the intermediate device may map an operation on the logical storage space to an operation on the physical storage space in the storage node. In this way, an operation on the service data stored in the physical storage space that is in the storage node is implemented.
In an implementation of the first aspect, there are a plurality of intermediate devices between the computing node and the storage node, the computing node designates a first intermediate device in the plurality of intermediate devices for the logical storage space, and the first intermediate device is configured to store the metadata of the logical storage space. For example, the computing node may select the first intermediate device from the plurality of intermediate devices based on the identifier of the logical storage space.
In this way, a plurality of storage spaces of the application are distributed to different intermediate devices. In this way, load balancing of the intermediate device can be achieved, and data processing efficiency of the application can be improved.
In an implementation of the first aspect, the logical storage space is a persistence LOG space.
In an implementation of the first aspect, the intermediate device is a network switch device.
Because communication between the computing node and the storage node is implemented through a network switch, the metadata of the storage space of the application is stored in the network switch, so that the storage space of the application can be conveniently managed.
In an implementation of the first aspect of the present application, when writing the service data to the storage node, the computing node first sends a write request to the intermediate device. The intermediate device allocates a first address in the logical storage space to the to-be-written data; determines, in the storage node, a second address that corresponds to the first address and to which the to-be-written data is written; and requests the storage node to write the to-be-written data to the second address.
In a process of writing data, because the computing node only needs to send the write request to the intermediate device, a load of the computing node and a latency of the write request are reduced.
In an implementation of the first aspect, after allocating the logical storage space to the application, the computing node requests the intermediate device to allocate the physical storage space in the storage node to the logical storage space. After receiving the request, the intermediate device requests the storage node to allocate the physical storage space to the logical storage space. In this way, a correspondence between the logical storage space of the application and the physical storage space in the storage node is established, and then the application can access data in the physical storage space through the logical storage space.
In an implementation of the first aspect, the intermediate device allocates an address in the logical storage space to each write request based on a sequence of receiving at least two write requests, or the intermediate device returns, based on the sequence of receiving the at least two write requests, a message indicating completion of each write request to the computing node.
A completion message or an allocated address of a write request is returned according to a sequence of write requests, so that to-be-written data of the write requests can be sequentially stored in storage nodes. In this way, a hole in storage media of the storage nodes can be prevented.
In an implementation of the first aspect, after receiving a read-only state setting request sent by the computing node for the logical storage space, the intermediate device sets a status of the logical storage space to a read-only state when no conflicts occur in the logical storage space.
In an implementation of the first aspect, the intermediate device is further configured to: after receiving a delete request sent by the computing node for the logical storage space, when no conflicts occur in the logical storage space, set a status of the logical storage space to a delete state, and indicate the storage node to delete the physical storage space corresponding to the logical storage space.
In the foregoing two implementations, the intermediate device determines whether there is an operation conflict in the logical storage space, to implement status mutual exclusion control. This reduces a quantity of times of communication between the computing node and the storage node, and improves processing efficiency of the computer system.
In an implementation of the first aspect, after receiving the first write request sent by the computing node, the method further includes: obtaining a plurality of fragments of the to-be-written data in the first write request; and the determining to write the to-be-written data in the first write request to a first address of the storage node includes: determining to write the plurality of fragments to first addresses of a plurality of storage nodes.
In this implementation, the intermediate device stores the plurality of fragments of the to-be-written data into the plurality of storage nodes. This ensures data reliability. In addition, because the intermediate device serves as a convergence point and gives a high performance in processing packets, an amount of data that needs to be transmitted by the computing node is reduced while reliability is ensured.
In an implementation of the first aspect, the intermediate device allocates the second address to the first write request among unallocated spaces of the logical storage space according to an ascending order of addresses.
In an implementation of the first aspect, the storage node includes a first storage node and a second storage node. The second address corresponds to a first address that is in the first storage node and into which the to-be-written data is written, and a first address that is in the second storage node and into which the to-be-written data is written. That the requesting the storage node to write the to-be-written data into the first address includes: requesting the first storage node to write the to-be-written data to the first address that is in the first storage node and to which the to-be-written data is written; and requesting the second storage node to write the to-be-written data to the first address that is in the second storage node and to which the to-be-written data is written.
In this implementation, the intermediate device writes data copies to the plurality of storage nodes, thereby ensuring data reliability. In addition, because of the position taken by the intermediate device in the network and high performance of the intermediate device in processing packets, an amount of data that needs to be transmitted by the computing node to the intermediate device is reduced while reliability is ensured.
In an implementation of the first aspect, the intermediate device records a queue. The queue records, according to a sequence of receiving times, to-be-processed write requests that are for the logical storage space and that are received from the computing node. When it is determined that a first write request is a request received earliest in the queue, it is determined that other write requests that are for the logical storage space and received before the first write request are all completed.
In this implementation, the sequence of receiving write requests is recorded in a queue, so that the sequence of returning information indicating completion of the write requests can be conveniently and accurately controlled.
In an implementation of the first aspect, the intermediate device is further configured to establish a QP queue corresponding to the computing node and N QP queues respectively corresponding to N storage nodes, to maintain connection information. In this way, reliable one-to-N multicast communication can be implemented.
In an implementation of the first aspect, when the intermediate device receives a read request for the logical storage space, if a status of the logical storage space is a delete state, the intermediate device blocks the read request, or returns information indicating that the status of the logical storage space is the delete state to the computing node.
A second aspect of the present application provides a method applied to the computer system provided in the first aspect of the present application. Steps included in the method are performed by the computing node, the intermediate device, and the storage node of the computer system. A function corresponding to each step in the method is the same as a function performed by the computing node, the proxy device, and the storage node in the computer system. Beneficial effects are also the same. Details are not described herein again.
A third aspect of the present application provides a data processing method performed by an intermediate device, where the intermediate device is connected to a computing node and a storage node. A service runs in the computing node. The storage node stores data of the service. The intermediate device first receives a first write request that is for the data of the service and that is sent by the computing node, determines to write to-be-written data in the first write request to a first address of the storage node, and requests the storage node to write the to-be-written data to the first address after determining the first address.
Metadata of the storage space of the service is stored on the intermediate device. In this way, the computing node only needs to send the write request to the intermediate device. The intermediate device allocates an address in the storage space for the write request based on the metadata, and writes data to the storage device according to the allocated address. Because the computing node only needs to send the request to the intermediate device, a load of the computing node and a latency in writing data are reduced.
Functions implemented in other steps performed by the proxy node in the method are the same as functions implemented by the intermediate device in the computer system in the first aspect. Beneficial effects are the same. Details are not described herein again.
A fourth aspect of this application provides a data processing apparatus, where the apparatus is deployed in an intermediate device, and the intermediate device is connected to a computing node and a storage node. A service runs in the computing node. Data of the service is stored on the storage node. The processing apparatus includes several modules, where the modules are configured to perform the steps in the data processing method provided in the third aspect of this application, and module division is not limited herein. For specific functions performed by the modules of the data processing apparatus and beneficial effects, refer to functions of the steps in the data processing method provided in the third aspect of this application. Details are not described herein again.
A fifth aspect of this application provides another computer system. The computer system includes a computing node, a storage node, and an intermediate device. The computing node accesses the storage node through the intermediate device. An application runs in the computing node. The storage node stores data of the application. The computing node is configured to designate a storage space for the application data, and send a first allocation request, where the first allocation request is used to request a physical storage space in a storage node corresponding to the storage space. The intermediate device is configured to receive the first allocation request sent by the computing node, send a second allocation request to the storage node based on the first allocation request, obtain the physical storage space allocated by the storage node, and establish and store metadata of the application based on information of a logical storage space and information of the physical storage space.
The metadata is stored in the intermediate device, and a storage space in a storage node is allocated to the storage space of the application. As a result, a load of the storage node and a load of the computing node can be reduced, and data transmission efficiency can be improved.
In an implementation of the fifth aspect of this application, the first allocation request carries an identifier and a volume of the logical storage space, and the metadata includes: the identifier of the logical storage space, the volume of the logical storage space, an available address of the logical storage space, and address information of the physical storage space corresponding to the logical storage space.
In an implementation, the logical storage space is a persistence LOG space, and the computing node designates an identifier and a volume of the persistent LOG space.
In an implementation of the fifth aspect of this application, the computing node is further configured to send a first write request, where the first write request carries first data of the application and an identifier of a logical storage space corresponding to the first data; the intermediate device is further configured to receive the first write request, determine, in the storage node, a first physical address for the first data based on the metadata of the application, and send a second write request to the storage node based on the first physical address; and the storage node is further configured to receive the second write request, and store the first data at the first physical address.
In an implementation of the fifth aspect of this application, after receiving the first write request, the intermediate device allocates a first logical space address to the first data based on the identifier of the logical storage space and a volume of the first data; and determine, based on a correspondence that is between the logical storage space and the physical storage space and that is recorded in the metadata, a first physical address corresponding to a first logical space address.
In an implementation of the fifth aspect of this application, the intermediate device is further configured to: receive a storage completion notification sent by the storage node, and confirm, based on the notification, that execution of the first write request is completed.
In an implementation of the fifth aspect of this application, the intermediate device updates the information about the metadata of the application based on the first write request.
In an implementation of the fifth aspect of this application, the computer system includes a plurality of intermediate devices, and the computing node is further configured to select one intermediate device from the plurality of intermediate devices, to send the first allocation request or the first write request to the selected intermediate device.
A sixth aspect of this application provides a computer system. The computer system includes a computing node, a storage node, and an intermediate device. The computing node accesses the storage node through the intermediate device. The computing node is configured to send a first write request to the intermediate device, where the first write request is for a logical storage space, and the logical storage space corresponds to a storage space in the storage node. The intermediate device is configured to receive the first write request, and when determining write requests that are for the logical storage space and received before the first write request are all completed, notify the computing node that the first write request is completed.
A write completion request is returned by the intermediate device according to a sequence of receiving the write requests. In this way, a hole in storage spaces in storage nodes can be prevented while parallel processing of the write requests of different computing nodes is implemented.
For other functions implemented by the computer system, refer to the functions provided by the computer system in the first aspect. Details are not described herein again.
A seventh aspect of this application provides an intermediate device, where the intermediate device includes a processing unit and a storage unit, the storage unit stores executable code, and when the processing unit executes the executable code, any one of the foregoing data processing methods is implemented.
An eighth aspect of this application provides an intermediate device, where the intermediate device includes: a communication interface, configured to transmit data between a storage node and a computing node; and a processing unit, configured to process data received by the communication interface, to perform any one of the foregoing data processing methods.
A ninth aspect of this application provides a storage medium, where the storage medium stores executable instructions, and a processor of an intermediate device executes the executable instructions in the storage medium to implement the methods provided in the second aspect and the third aspect of this application.
A tenth aspect of this application provides a program product, where a processor of an intermediate device runs the program product, to control the processor to perform the methods provided in the second aspect and the third aspect of this application.
Embodiments of this application are described with reference to accompanying drawings, so that embodiments of this application can be clear.
The following describes technical solutions in embodiments of this application with reference to the accompanying drawings.
In the multi-node distributed computer system shown in
In related technologies, a computing node is directly connected to a storage node, and the computing node accesses a segment of a logical storage space corresponding to a storage space of the storage node, to access the storage space of the storage node. For example, the logical storage space is a Persistence LOG (PLOG) space. The PLOG is identified by a unique identifier PLOG ID of the PLOG. Data stored in the PLOG is stored through appending, that is, stored data is not modified through overwritten, but a modification is appended and stored in a new address. Usually, the PLOG corresponds to a contiguous physical storage space in a medium such as an SCM or a solid state disk (SSD) in a storage node. The physical storage space is a storage space provided by the storage node for an external device, and the external device may use the storage space to access data. For media such as an SCM, the physical storage space is a physical address at which data is actually stored. However, for storage media such as an SSD, the physical storage space is not a physical address at which data is actually stored.
In the related technology, metadata of a PLOG is stored in a storage node. The metadata includes information such as an ID of the PLOG, an address of a storage space in a storage node corresponding to the PLOG, and an address of an unallocated space of the PLOG. The storage node manages the PLOG based on the metadata of the PLOG. For example, when a computing node is to write data to the storage node by using the PLOG, the computing node first requests the storage node to allocate an address in the PLOG to the data. After allocating an address in the unallocated storage space in the PLOG to the data based on the PLOG metadata, the storage node returns the allocated address to the computing node. After obtaining the allocated address, the computing node sends a write request to the storage node, to write the data to the storage node. In this related technology, when writing data to the storage node, the computing node needs to communicate with the storage node for a plurality of times. This increases a load of the computing node, and increases a latency in writing data.
In this embodiment of this application, as shown in
In addition, when the switch manages the PLOG, the switch processes write requests according to a sequence of receiving the write requests, to sequentially allocate addresses in the PLOG to the write requests. After allocating the addresses, the switch writes to-be-written data that is in the write requests to the storage nodes. After the storage nodes complete writing data, the switch returns the write request completion information according to the sequence of receiving the write requests, to prevent a hole in storage spaces of the storage nodes corresponding to the PLOG.
The following describes in detail a data processing process performed by the computer system shown in
As shown in
In flowcharts of
An example including the computing node C0, the switch St0, and the storage nodes S0, S1, and S2 is used for description.
It is assumed that the computing node in
Specifically, the application in the computing node C0 invokes the write interface, and inputs an identifier of PLOG i, a storage address of to-be-written data, and a length of the to-be-written data. The PLOG function layer invokes a data sending interface of a network interface card based on the input parameters, obtains the to-be-written data according to the storage address and the length of the to-be-written data, assembles the to-be-written data to a data packet of the write request 1 based on the to-be-written data, and sends the data packet to the switch, where the write request 1 includes the identifier of the PLOG i and the to-be-written data. In an implementation, the computing node C0 may determine, by performing hash calculation on the “PLOG i”, a switch corresponding to the PLOG i, and send the data packet to the corresponding switch. In another implementation, the computing node C0 may send the data packet to any switch connected to the computing node C0, and the switch forwards, based on a locally stored mapping relationship between a PLOG and a switch, the data packet to the switch corresponding to the PLOG i. It is assumed that the PLOG i corresponds to a switch St0. Therefore, the write request 1 for the PLOG i is sent to the switch St0.
Step S202. The switch determines whether a remaining space of the PLOG i is sufficient.
As described above, the switch St0 corresponds to the PLOG i. Therefore, metadata of the PLOG i is recorded in the switch St0, so that the PLOG i is managed based on the metadata. The metadata of the PLOG i includes, for example, information such as the identifier of the PLOG i, address information of the PLOG i (for example, a start address of an unallocated storage space of the PLOG i and a volume of the PLOG i), and address information of a storage space in a storage node corresponding to the PLOG i. Because the PLOG i is a logical storage space and an address in the PLOG i is an offset address starting from 0, the switch can obtain a volume of the remaining space of the PLOG i by subtracting the start address of the unallocated storage space of the PLOG i from the volume of the PLOG i.
After receiving the write request 1, the switch compares the length of the to-be-written data with the volume of the remaining space of the PLOG i, to determine whether the remaining space is sufficient for writing the to-be-written data. If the remaining space is insufficient, the switch performs step S301 in a process A shown in
Step S301. After determining that the remaining space of the PLOG i is insufficient in step S202, the switch sends, to the computing node, information indicating that the space of the PLOG i is insufficient.
Step S302. The computing node generates the PLOG j.
For example, in the computing node C0, after receiving the information indicating that the space of the PLOG i is insufficient, the application may invoke a PLOG application interface at the PLOG function layer, to send a request for applying for a new PLOG to the PLOG function layer. In the application request, information such as a storage node corresponding to the newly applied PLOG, a volume of the PLOG, and an initial status of the PLOG is specified. The initial status information indicates, for example, a readable and writable state. In the computer system, to ensure reliability of data storage, same data may be stored in a plurality of storage nodes, that is, each storage node stores one copy of the data. In this way, when a storage node is abnormal, data may be obtained from another storage node. In this case, a plurality of storage nodes corresponding to the newly applied PLOG are specified in the foregoing application request. For example, the foregoing application request may specify that the newly applied PLOG corresponds to the storage nodes S0, S1, and S2, that is, data written to the PLOG is stored in the storage nodes S0, S1, and S2.
After receiving the request for a new PLOG from the application, the PLOG function layer in the computing node C0 generates a new PLOG ID, and returns the ID to the application. The newly generated PLOG ID may be represented as “PLOG j”. For example, the ID is generated according to a sequence of PLOG numbers. For example, if the PLOG i is a PLOG 1, it is determined that the “PLOG j” is a PLOG 2. It should be understood that, in this embodiment of this application, determining the PLOG ID is not necessarily based on the sequence of PLOG numbers, as long as the PLOG ID can uniquely identify the newly generated PLOG. For example, an ID of a deleted PLOG may be reused as an ID of the new PLOG.
Step S303. The computing node sends metadata of the PLOG j to the switch.
After the PLOG function layer of the computing node C0 generates the PLOG j as described above, in an implementation, the computing node C0 may determine, according to a predetermined rule, a switch corresponding to the PLOG j, and send the metadata of the PLOG j to the switch corresponding to the PLOG j; or in another implementation, the computing node C0 may send the metadata information of the PLOG j to any switch connected to the computing node C0, and the switch enables a plurality of switches to determine, through negotiation, a switch corresponding to the PLOG j, and send the metadata of the PLOG j to the switch corresponding to the PLOG j. It is assumed that the switch corresponding to the PLOG j is a switch St0, that is, the metadata of the PLOG j is sent to the switch St0. The sent metadata includes information such as an identifier of the PLOG j, storage nodes (that is, S0, S1, and S2) corresponding to the PLOG j, a volume of the PLOG j, and an initial status of the PLOG j.
Step S304. The switch sends a storage space allocation request to the storage node.
After receiving the metadata of the PLOG j, the switch St0 locally stores the metadata of the PLOG j, and adds address information of an unallocated storage space of the PLOG j to the metadata, for example, a start address of the unallocated storage space. In this case, because the PLOG j is not used yet, the start address of the unallocated storage space of the PlOG j is a default offset address 0.
The switch St0 may record metadata of each PLOG in a form of a table. Table 1 shows metadata of PLOGs managed by the switch St0.
As shown in Table 1, “Status” indicates a status of the PLOG, and the status of the PLOG may include a readable and writable (RW) state, a read-only (R) state, and a delete (Delete) state. The RW state indicates that a storage space in a storage node corresponding to a PLOG is readable and writable. The R state indicates that a storage space in a storage node corresponding to a PLOG is readable but not writable. The delete state indicates that deletion is being performed on a storage space corresponding to a PLOG in a storage node corresponding to the PLOG. In Table 1, “Offset” indicates a start address of an unallocated storage space of a PLOG, and “Volume” indicates a volume of a PLOG. In addition, in the metadata of the PLOG j, “S0: Addr=/S1: Addr=/S2: Addr=” indicates that the PLOG j corresponds to the storage nodes S0, S1, and S2, and an address (Addr) corresponding to the PLOG j in each of the storage nodes S0, S1, and S2 is not determined yet, and therefore is empty.
Then, the switch St0 sends a storage space allocation request to the corresponding storage nodes based on the storage nodes corresponding to the PLOG j in the metadata of the PLOG j, where the storage space allocation request includes the volume of the PLOG j. For example, the volume of the PLOG j is 2 megabytes (M). Therefore, the storage space allocation request is used to request to allocate a storage space of 2 M in the storage nodes. When writing a plurality of copies of data, for example, the storage nodes S0, S1, and S2 are specified in the information of the PLOG j sent by the computing node C0 to the switch St0, the switch St0 sends the storage space allocation request to the storage nodes S0, S1, and S2.
Step S305. The storage node allocates a storage space according to the storage space allocation request, and returns storage space information.
As described above, for multi-copy storage, after receiving the storage space allocation request, the storage nodes S0, S1, and S2 respectively allocate local storage spaces to the PLOG j, and return storage space information to the switch St0. Only operations by the storage node S0 are described herein as an example. After receiving the storage space allocation request, the storage node S0 obtains an unused contiguous storage space of 2 M from a local storage medium (for example, an SCM), marks the storage space as allocated, and returns address information (for example, a start address or an address range) of the storage space to the switch St0.
Step S306. The switch records a relationship between the PLOG j and the storage space information.
Specifically, after receiving the information of the storage space, for example, the start address of the storage space, from each storage node corresponding to the PLOG j, the switch St0 records the information in the metadata of the PLOG j. Specifically, the switch St0 may record, in corresponding “Addr” fields in Table 1, start addresses of storage spaces allocated to the PLOG j in the storage nodes S0, S1, and S2.
After the foregoing steps are completed, the process of creating the PLOG j is completed. After the creation process is completed, the switch may notify each computing node of the newly generated PLOG j, so that each computing node may write data to the PLOG j in parallel.
In addition, when an application in the computing node is used for the first time, the computing node allocates an initial PLOG to the application, and allocates a storage space to the initial PLOG. A process of allocating the storage space to the initial PLOG is the same as the process of allocating the storage space to the PLOG j. For details, refer to descriptions of steps S303 to S306. Details are not described herein again.
Step S307. The computing node sends a write request 1′ for the PLOG j to the switch.
As described above, after obtaining the newly generated PLOG j, the application in the computing node C0 may re-initiate the write request 1′ for the PLOG j, to write the to-be-written data that is not successfully written. The write request 1′ is used to distinguish from the write request 1 for the PLOG i. Similarly, the write request 1′ includes the identifier of the PLOG j and the to-be-written data.
Step S308. The switch allocates an address space 1′ to the write request 1′ in the PLOG j.
When the address space 1′ is allocated to the write request by the switch, concurrency control may be performed on write requests from the plurality of computing nodes. For details, refer to
In this embodiment of this application, the switch allocates a write address space to each write request based on a sequence of receiving write requests, to implement mutual exclusion of write address spaces allocated to the write requests. In this way, for concurrent write requests of different computing nodes, a mutex does not need to be set. In this way, concurrency control can be performed on access by different computing nodes to a same PLOG, thereby achieving higher processing efficiency.
Step S309. The switch determines an address space 2′ corresponding to the address space 1′ in the storage node.
In an implementation, the switch may calculate, based on the start address offset1 of the address space 1′ and the start address of the storage space in the storage node corresponding to the PLOG j, a start address of the address space 2′ corresponding to the address space 1′ in the storage node. In addition, it can be determined that the address space 2′ is of the length of the to-be-written data of the write request 1′. For multi-copy write, the switch St0 may determine address spaces 2′ corresponding to address space 1′ in the storage nodes S0, S1, and S2. For example, it is assumed that the start address of the storage space corresponding to the PLOG j in the storage node S0 is 100. As described above, the start address of the address space 1′ is 10. In this case, the start address of the address space 2′ is 100+10=110. Similarly, the switch St0 may calculate start addresses of the address spaces 2′ corresponding to the address space 1′ in the storage nodes S1 and S2.
It may be understood that, in the foregoing implementation, the address space 2′ is determined by determining the start address and the address space length of the address space 2′. This embodiment of this application is not limited thereto. Another manner in which the address space 2′ can be determined also falls within the protection scope provided in this embodiment of the present application.
Step S310. The switch sends a write request 2′ for the address space 2′ to the storage node.
After determining the start address of the address space 2′ corresponding to the address space 1′ in the corresponding storage node, the switch generates the write request 2′ used for writing to the corresponding storage node, where the write request 2′ includes information (for example, the start address) of the address space 2′ and the to-be-written data in the write request 1′. Refer to
It can be learned from steps S308 to S310 that, because the switch stores the metadata of the PlOG j, the computing node only needs to send the write request 1′ to the switch St0. The switch St0 allocates the address space 1′ to the write request 1′ based on the metadata of the PlOG j. After allocating the address space 1′ to the PlOG j, the switch St0 does not need to return the allocated address to the computing node C0. Therefore, load of the computing node C0 is effectively reduced, and a latency in writing data is reduced. As a result, processing efficiency of the computer system is improved.
After sending the write request 2′ corresponding to the write request 1′, the switch St0 may record the write request 1′ in a local write request queue corresponding to the PLOG j. As a result, after the data is written to the storage node, the write request can be returned according to a sequence of receiving write requests. A specific return manner is explained in the following descriptions.
After step S310 is performed, the switch St0 may start to process the write request 1″, that is, to allocate an address space 1″ in the PLOG j to the write request 1″, determine a start address of an address space 2″ corresponding to the address space 1″ in the storage node, and send a write request 2″ for the address space 2″ to the storage node. As shown in operations {circle around (6)}, {circle around (7)} and {circle around (8)} in
It may be understood that a form of the write request queue is not limited to that shown in
Step S311. The storage node writes the to-be-written data to the address space 2′ in the storage node.
After receiving the write request 2′, in response to the write request 2′, the storage node writes the to-be-written data to the address space 2′ in the storage space corresponding to the PLOG j. Refer to
Step S312. The storage node returns information indicating completion of the write request 2′ to the switch.
After writing the to-be-written data in the write request 1′ to the address spaces 2′ in the storage spaces of the storage nodes S0, S1, and S2, the storage nodes S0, S1, and S2 respectively return information indicating completion of the write request 2′ to the switch St0.
Step S313. The switch determines that all write requests that are for the PLOG j and received before the write request 1′ are completed.
When the switch St0 receives the information indicating completion of the write request 2′ from the storage nodes S0, S1, and S2, it indicates that the to-be-written data in the write request 1′ is written to the three storage nodes corresponding to the PLOG j. When order-preserving writing of the write requests for the PLOG j in the address spaces of the PLOG j is implemented, a hole in the storage spaces of the storage nodes corresponding to the PLOG j can be prevented. In this embodiment of the present application, the write request is returned to the computing node according to the sequence in which the switch receives the write requests for the PLOG j. In this way, order-preserving writing of the write requests in the address spaces of the PLOG j is implemented. Because the switch St0 allocates, according to the sequence of receiving the write requests, the address space in the PLOG j to each write request for the PLOG j, an address in an address space allocated to a write request received earlier is smaller. The write requests are returned to the computing node according to the sequence of receiving the write requests, in other words, the write requests are returned according to ascending order of write addresses of the write requests. In this way, order-preserving writing in the address spaces of the PLOG j is implemented, and a hole in the storage spaces corresponding to the PLOG j in the storage nodes is prevented.
In this embodiment of this application, it may be determined, based on the write request queue shown in
S14. The switch returns related information of the write request 1′ to the computing node.
After determining that all the write requests that are for the PLOG j and received before the write request 1′ are completed, the switch St0 returns the related information of the write request 1′ to the computing node C0, where the related information includes information indicating completion of the write request 1′, and a write address allocated to the write request 1′ in the PLOG j. After the related information of the write request 1′ is returned to the computing node, the switch St0 deletes the ID of the write request 1′ from the write request queue.
Specifically, in an implementation, after determining that all the write requests before the write request 1′ are completed, the switch St0 returns the confirmation information to the computing node C0, where the confirmation information indicates that the write request 1′ is completed. After receiving the confirmation information, the computing node C0 sends a read request for reading the related information of the write request 1′ to the switch St0. After receiving the read request, the switch St0 returns the completion information and the write address of the write request 1′ to the computing node C0. In another implementation, after determining that all the write requests before the write request 1′ are completed, the switch St0 may directly return the completion information and the write address of the write request 1′ to the computing node C0.
After obtaining the information indicating completion of the write request 1′, the application in the computing node C0 records the write address of the write request 1′ (namely, PLOG j, address space 1′) in a service that generates the write request 1′, to facilitate a subsequent service to read the data written to the write address. When the application in the computing node C0 needs to read the data written by using the write request 1′, the application may send the read request for the address space 1′ of the PLOG j to the switch St0. The switch St0 may read, based on the metadata of the PLOG j, the data from the address space 2′ corresponding to the address space 1′ in any of the storage nodes S0, S1, and S2, and return the data to the computing node C0.
Step S601. After determining, in step S202 in
Similarly, after obtaining the information indicating completion of the write request 1, the application in the computing node records the write address of the write request 1 (namely, PLOG i, address space 1) in a service that generates the write request 1, to facilitate a subsequent service to read the data written to the write address.
The foregoing describes the method for writing data provided in embodiments of this application mainly by using the example in which data reliability is ensured by multi-copy storage. Embodiments of this application may be further applied to a scenario in which data reliability is ensured by storing data as fragments, for example, by using a redundant array of independent disks (Redundant Array of Independent Disks, RAID) algorithm and erasure coding (EC) to ensure the reliability of written data. Details are shown in a flowchart of
As shown in
For example, the write request 1 includes an identifier of the PLOG i and the to-be-written data 1.
Step S702. After receiving the write request 1, the switch obtains a plurality of fragments of the to-be-written data of the write request 1.
After receiving the write request 1, the switch may divide the to-be-written data 1 in the write request 1 into the plurality of data fragments according to a preset EC algorithm or a RAID algorithm, and calculate parity fragments of the plurality of data fragments. Alternatively, before sending the write request 1, the computing node may divide the data 1 into the plurality of data fragments, and calculate the parity fragments of the plurality of data fragments, where the data fragments and the parity fragments of the data 1 are included in the sent write request 1. Therefore, the switch may directly obtain the plurality of data fragments and the parity fragments of the data 1 from the write request 1. Refer to
Step S703. The switch determines whether a remaining space of the PLOG i is sufficient.
In this embodiment, each fragment of the data 1 is stored in the storage node rather than the data 1. Therefore, in this step, it is determined whether the remaining space of the PLOG i is sufficient to store one fragment of the data 1. If the space is insufficient, the process A shown in
A difference between the methods shown in
Similarly, after the fragments of the data 1 are written to the storage nodes, the switch St0 returns information indicating completion of the write request 1′ and the write address to the computing node C0. In this case, after obtaining the information indicating completion of the write request 1′, an application in the computing node C0 records the write address of the write request 1′ (namely, PLOG j, address space 1′) in a service that generates the write request 1′, to facilitate a subsequent service to read the data written to the write address.
Metadata of a storage space of the service is stored on an intermediate device, so that the intermediate device manages the storage space of the service based on the metadata. In this way, when writing data, the computing node only needs to send a write request to the intermediate device. The intermediate device allocates an address in the storage space for the write request based on the metadata, and writes data to a storage device according to the allocated address. Because the computing node only needs to send the request to the intermediate device, load of the computing node and a latency in writing data are reduced.
In embodiments of this application, in addition to running, by using the switch, the control logic of concurrent write, another PLOG control flow can also be run by using the switch. The following describes a control flow for avoiding a concurrency conflict between a read request and a delete request with reference to
As shown in
Refer to
Step S1002. The computing node sends the read request 1 to the switch.
As described above, for example, the PLOG i corresponds to the switch St0. Therefore, the computing node C0 sends the read request 1 to the switch St0.
Step S1003. The switch determines that the PLOG i is readable.
According to the foregoing descriptions of Table 1, the switch St0 records metadata of each corresponding PLOG. The metadata includes a status of the PLOG, and the status includes, for example, a readable and writable state, a read-only state, and a delete state. In other words, the switch St0 may query the status of the PLOG i in the locally recorded metadata of the PLOG i, to determine whether the PLOG i is readable. If the status of the PLOG i is the readable and writable state or the read-only state, the PLOG i is readable, and subsequent steps S1004 to S1010 in
Step S1004. The switch updates, based on the read request 1, information about read requests that are for the PLOG i and that are being executed.
The switch records the information about read requests that are for the PLOG i and that are being executed. When receiving a read request and completing a read request, the switch updates the information about read requests that are for the PLOG i and that are being executed. Because the switch records the information about read requests that are for the PLOG i and that are being executed, if the switch does not complete processing a read request for the PLOG i while the switch receives a delete request for the PLOG i sent by another computing node, the switch may block the delete request based on the information about read requests that are for the PLOG i and that are being executed.
Specifically, the information about read requests that are for the PLOG i and that are being executed is a quantity of read requests that are for the PLOG i and that are being executed by the switch, and the information is represented by a variable a. After receiving the read request 1, the switch adds 1 to a value of the variable a. When a read request is completed, the value of variable a is decreased by 1.
Step S1005. The switch determines an address space 2 corresponding to the address space 1 in the storage node. For this step, refer to the foregoing descriptions of step S309. Details are not described herein again.
Step S1006. The switch sends a read request 2 for the address space 2 to the storage node. The read request 2 includes information about the address space 2, for example, includes a start address of the address space 2 and a length of the address space 2.
Step S1007. After receiving the read request 2, the storage node reads the address space 2 according to the read request 2.
Step S1008. The storage node returns data obtained through reading (namely, the foregoing data 1) to the switch.
Step S1009. The switch updates the information about read requests that are for the PLOG i and that are being executed.
After receiving the read request 2 returned from the storage node, the switch St0 indicates that the read operation on the address space 2 in the storage node is completed, and updates the information about read requests that are for the PLOG i and that are being executed, in other words, subtraction is performed by the switch St0 to subtract 1 from the value of the variable a.
Step S1010. The switch returns the data obtained through reading to the computing node.
The switch St0 may return the data 1 obtained through reading to the computing node C0, and the computing node C0 may return the data 1 to the user terminal.
As shown in
In an implementation, a plurality of upper-layer applications run in a computing node C0, and each application applies for a PLOG to store data generated in the application. In other words, in this case, the PLOG corresponds to only one application. Life cycles of PLOGs of the applications are different due to settings in the applications and types of data recorded in the PLOGs. For example, the PLOG i records a browsing history of a user, and it is set in the application that the browsing history is kept for one week. In other words, a life cycle of the PLOG i is one week, and the PLOG i is deleted in one week. Therefore, in the computing node C0, after the life cycle of the PLOG i ends, the computing node C0 may send a delete request for the PLOG i to the switch St0.
In addition, the computing node may also generate a delete request for the PLOG i according to a deletion operation of the user.
Step S1102. After receiving the delete request for the PLOG i, the switch determines that there are no read requests for the PLOG i being executed.
After receiving the delete request for the PLOG i, the switch St0 needs to determine whether there is a read request for the PLOG i being executed. If there is a read request for the PLOG i being executed, execution of the delete request is suspended. For example, when the execution of the delete request is suspended, whether there is a read request being executed is determined thorough polling, in other words, polling is performed to read a value of the variable a to determine whether the value of the variable a becomes 0, until it is determined that the value of the variable a is 0 (that is, it is determined that there are no read requests being executed). Then subsequent steps in
Step S1103. The switch records that the PLOG i is in a delete state.
After determining that there are no read requests for the PLOG i being executed, the switch St0 may record that, in the metadata of the PLOG i shown in Table 1, a status of the PLOG i is the delete (Delete) state, to block a subsequent read request for the PLOG i as described above.
Step S1104. The switch determines a storage space corresponding to the PLOG i.
The switch St0 may determine, based on the metadata of the PLOG i, a storage node corresponding to the PLOG i and a storage space corresponding to the PLOG i in each storage node. Specifically, a start address of the storage space corresponding to the PLOG i in each storage node and a volume of the storage space may be determined, to determine the storage space corresponding to the PLOG i in each storage node.
Step S1105. The switch sends a delete request for the storage space to the storage node.
The switch St0 sends a delete request, to storage nodes S0, S1, and S2 corresponding to the PLOG i, for the storage space corresponding to the PLOG i. For example, the delete request includes the start address and the volume of each corresponding storage space in the storage nodes S0, S1, and S2.
Step S1106. The storage node deletes the storage space.
After receiving the deletion request, the storage nodes S0, S1, and S2 delete respective storage spaces. For example, in the storage node S0, the storage space is determined based on the start address and the volume of the storage space in the delete request. Data stored in the storage space is deleted, and a record indicating that the storage space is allocated is canceled, so that the storage space can be reallocated to another PLOG.
Step S1107. The storage node returns information indicating successful deletion.
After the storage spaces are deleted, the storage nodes S0, S1, and S2 respectively return the information indicating successful deletion to the switch St0.
Step S1108. The switch deletes information of the PLOG i.
After determining, based on information that is about the storage nodes corresponding to the PLOG i and that is recorded in the metadata of the PLOG i, that the storage nodes S0, S1, and S2 all return the information indicating successful deletion, the switch St0 may delete the record of the metadata of the PLOG i in Table 1. In this way, the PLOG i is deleted.
Step S1109. The switch returns the information indicating successful deletion to the computing node.
After deleting the metadata of the PLOG i, the switch St0 returns the information indicating successful deletion to the computing node C0, so that the computing node C0 can delete the stored information related to the PLOG i.
In this way, after receiving the delete request for the PLOG i sent by the computing node, when no conflicts occur in the PlOG i, for example, when there are no read requests for the PLOG i, the intermediate device sets the status of the PLOG i to the delete state and indicates the storage node to delete a physical storage space corresponding to the logical storage space.
As shown in
For example, when a write operation is performed on the PLOG i as shown in FIG. 6, it is assumed that after the switch St0 sends the write request 2 for the address space 2 in step S603, the switch St0 does not receive the information indicating completion of the write request 2 from the storage node due to reasons such as an exception of the storage node. Therefore, information indicating completion of the write request 1 is not returned to the computing node C0. In this case, if computing node C0 does not receive, within a preset period of time, the information indicating completion of the write request 1, the computing node C0 can determine that execution of the write request 1 fails, and send a read-only state setting request for the PLOG i to the switch St0.
Step S1202. The switch determines that there are no write requests for the PLOG i being executed.
After receiving the read-only state setting request for the PLOG i, the switch St0 needs to first determine whether a write request for the PLOG i is being executed currently. If the write request exists, the read-only state setting request conflicts with the write request, and the switch St0 suspends execution of the read-only state setting request. In addition, after determining that there is a write request for the PLOG i being executed, the switch St0 may perform polling to determine whether there is a write request being executed, and execute the read-only state setting request after determining that there are no write requests for the PLOG i being executed.
To be specific, similarly, every time the switch St0 starts to execute a write request for the PLOG i (for example, the write request 1 in
Step S1203. The switch sets the status of the PLOG i to a read-only state.
After determining that there are no write requests for the PLOG i currently being executed, the switch St0 sets the status of the PLOG i to the read-only state in metadata of the PLOG i. Therefore, when subsequently receiving a write request for the PLOG i, the switch St0 may block the write request based on the read-only state of the PLOG i.
Step S1204. The switch returns, to the computing node, information indicating that the read-only state is successfully set.
In this way, after receiving the read-only state setting request for the PLOG i sent by the computing node, the intermediate device sets the status of the PLOG i to the read-only state when no conflicts occur in the PlOG i, for example, when there are no write requests for the PLOG i.
According to the foregoing descriptions of the methods shown in
In embodiments of this application, control logics run by a computing node and a storage node are offloaded to a programmable intermediate device serving as a convergence point in a network. In a process of writing data, the computing node only needs to communicate with a switch once to write data to the storage node. In this way, a quantity of times of communication between the computing node and the switch, a load on the computing node, and a latency in writing data are reduced, and access efficiency is improved.
a receiving unit 131, configured to receive a first write request that is for the data of the service and that is sent by the computing node;
a determining unit 132, configured to determine to write to-be-written data that is in the first write request to a first address of the storage node; and
a write unit 133, configured to request the storage node to write the to-be-written data to the first address.
In an implementation, the first write request is for a logical storage space corresponding to the data of the service, the intermediate device stores metadata of the logical storage space, the metadata of the logical storage space includes an identifier of the logical storage space, address information of the logical storage space, and address information of a physical storage space corresponding to the logical storage space, the physical storage space belongs to a space in the storage node, and the first address is an address in the physical storage space.
In an implementation, the determining unit 132 is further configured to: allocate a second address in the logical storage space to the first write request; and determine, based on the metadata of the logical storage space, the first address to which the to-be-written data is written in the physical storage space corresponding to the logical storage space.
In an implementation, the receiving unit 131 is further configured to receive a notification message that is sent by the storage node and that is used to notify that writing of the to-be-written data is completed; and the apparatus 1300 further includes a notification unit 134, configured to: when it is determined that other write requests that are for the logical storage space and received before the first write request are completed, notify the computing node that the first write request is completed.
In an implementation, the apparatus 1300 further includes an allocation unit 135, configured to request, based on information that is about the logical storage space and that is received from the computing node, the storage node to allocate a physical storage space to the logical storage space.
In an implementation, the determining unit 132 is configured to allocate the second address to the first write request among unallocated spaces of the logical storage space according to an ascending order of addresses.
In an implementation, the storage node includes a first storage node and a second storage node, and the second address corresponds to a first address that is in the first storage node and to which the to-be-written data is written and a first address that is in the second storage node and to which the to-be-written data is written; and the write unit 133 includes: a first write subunit 1331, configured to request the first storage node to write the to-be-written data to the first address that is in the first storage node and to which the to-be-written data is written; and a second write subunit 1332, configured to request the second storage node to write the to-be-written data to the first address that is in the second storage node and to which the to-be-written data is written.
In an implementation, the apparatus 1300 further includes a conflict handling unit 136, configured to: after receiving a read-only state setting request for the storage space from the computing node, set a status of the storage space to a read-only state when it is determined that no conflicts occur in the logical storage space.
In an implementation, the apparatus 1300 further includes a conflict handling unit 136, configured to: after receiving a delete request for the storage space from the computing node, when it is determined that no conflicts occur in the logical storage space, set a status of the logical storage space to a delete state and request the storage node to delete the physical storage space corresponding to the logical storage space.
In an implementation, the apparatus 1300 further includes: an obtaining unit 137, configured to obtain N fragments of the to-be-written data in the first write request after receiving the first write request sent by the computing node; and the write unit 133 is configured to request N storage nodes to write the N fragments to first addresses of the N storage nodes.
It may be understood that the computing node and the storage node described above may be physical servers, or may be cloud servers (for example, virtual servers).
Specifically, the virtual machine 16012 is a virtual computer (server) simulated on a public hardware resource by using virtual machine software. An operating system and an application may be installed on the virtual machine, and the virtual machine may access a network resource. For an application running in a virtual machine, the virtual machine works like a real computer.
The hardware layer 16016 is a hardware platform for virtual environment running, and may be abstracted from hardware resources of one or more physical hosts. The hardware layer may include various types of hardware. For example, the hardware layer 16016 includes a processor 16014 (for example, a CPU) and a memory 16015, and may further include a network interface card (namely, NIC) 16013, a high-speed/low-speed input/output (I/O) device, and other devices with specific processing functions. The memory 16015 may be a volatile memory such as a random-access memory (RAM) or a dynamic random-access memory (DRAM); or the memory 16015 may be a non-volatile memory such as a read-only memory (ROM), a flash memory, a hard disk drive (HDD), or a solid-state drive (SSD), or a storage class memory (SCM); or the memory 16015 may include a combination of the foregoing types of memories. The virtual machine 16012 runs an executable program based on the VMM 16011 and a hardware resource provided by the hardware layer 16016, to implement the steps performed by the computing node in the methods of the foregoing embodiments. For brevity, details are not described herein again.
It should be understood that terms such as “first” and “second” in this specification is used to achieve simplicity in distinguishing similar concepts, and do not constitute any limitation.
A person of ordinary skill in the art should be aware that units and algorithm steps in the examples described with reference to embodiments disclosed in this specification can be implemented by electronic hardware, computer software, or a combination of computer software and electronic hardware. To clearly illustrate interchangeability of hardware and software, various illustrative components and steps have been described above generally in terms of functions. Whether the functions are implemented by hardware or software depends on particular applications and design constraints of the technical solutions. A person skilled in the art may use different methods to implement the described functions for each particular application, but it should not be considered that the implementation goes beyond the scope of the present application.
The foregoing descriptions are merely examples of embodiments of the present application, but are not intended to limit the protection scope of the present application. Any modification or replacement readily figured out by a person skilled in the art within the technical scope disclosed in the present application shall fall within the protection scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
202010627388.3 | Jul 2020 | CN | national |
202011269034.2 | Nov 2020 | CN | national |
This application is a continuation of International Application No. PCT/CN2021/102948, filed on Jun. 29, 2021, which claims priority to Chinese Patent Application No. 202011269034.2, filed on Nov. 13, 2020, and Chinese Patent Application No. 202010627388.3, filed on Jul. 2, 2020. All of the aforementioned patent applications are hereby incorporated by reference in their entireties.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2021/102948 | Jun 2021 | US |
Child | 18148962 | US |