Fabric attached memory refers to memory that is accessible over a fabric by any of multiple clients. A “fabric” can refer to a network that allows for communications between computing nodes connected to the network. The fabric attached memory can be implemented using memory devices, such as flash memory devices or other types of persistent memory devices.
Some implementations of the present disclosure are described with respect to the following figures.
Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements. The figures are not necessarily to scale, and the size of some parts may be exaggerated to more clearly illustrate the example shown. Moreover, the drawings provide examples and/or implementations consistent with the description; however, the description is not limited to the examples and/or implementations provided in the drawings.
Clients are able perform remote access (remote read or remote write) of fabric attached memory. A “client” can refer to any entity (a machine or a program) that is able to issue requests to access data in the fabric attached memory. Examples of networks over which clients are able to access fabric attached memory include any or some combination of the following: a COMPUTE EXPRESS LINK (CXL) interconnect, a Slingshot interconnect, an InfiniBand interconnect, and so forth.
A fabric attached memory can include a distributed arrangement of memory servers, where each memory server can include a persistent memory or may be coupled to a persistent memory. In some examples, clients can access the fabric attached memory using Remote Direct Memory Access (RDMA) over a network.
An RDMA data transfer from a client to a memory server involves a transfer of data from the client over the network to a persistent memory of the memory server, where the data transfer does not involve work performed by a main processor of the memory server. A “main processor” of a memory server can refer to the processor that executes the operating system (OS) and other machine-readable instructions (including firmware such as a Basic Input/Output System or BIOS and an application program) of the memory server.
In an RDMA data transfer, data is passed to or from the client over the network through a network interface of the memory server from or to the persistent memory of the memory server. A “network interface” can refer to a network communication component that transmits and receives signals, messages, packets, etc. (more generally, “information”), over a network. The network interface can include a network transceiver that transmits and receives information.
A memory server can include main memory including a collection of memory devices (a single memory device or multiple memory devices), such as dynamic random access memory (DRAM) devices, static random access memory (SRAM) devices, and so forth. The memory server can also include cache memory, which can be part of the main processor or associated with the main processor. The cache memory is associated with the main processor if the main processor uses the cache memory to store data used by the main processor.
Note that since the data of the RDMA data transfer passes through the network interface of the memory server, the data may be stored in a memory of the network interface, and may be processed by a processor of the network interface. However, the processor and the memory of the network interface is separate and distinct from the main processor and the cache memory that is part of or associated with the main processor.
Data backup can be formed from a fabric attached memory. A data backup involves copying data at a collection of memory servers (a single memory server or multiple memory servers) to a backup storage system. A backup storage system can include a collection of storage devices (a single storage device or multiple storage devices), where a storage device can be implemented using a disk-based storage device, a solid-state drive, and so forth. If data at the fabric attached memory were to be lost or exhibit unrecoverable errors, a restore operation can be performed from the backup storage system to recover the lost data or to restore to a prior version of the data without errors.
In some examples, a full data backup can copy the entirety of the data of the fabric attached memory to the backup storage system. However, if there is a large amount of data stored at the fabric attached memory, then the full data backup can be expensive in terms of resource usage (e.g., usage of processing resources, usage of communication resources, etc.), and the full data backup can take a relatively long period of time to complete (e.g., many hours or days).
In other examples, incremental data backup can be performed, in which modified data portions are copied to the backup storage system, without copying unmodified data portions to the backup storage system. A data portion being “modified” refers to the data portion having been changed relative to a copy of the data portion stored at the backup storage system 130.
An incremental data backup can reduce the backup time and resource usage. Also, performing incremental data backup can reduce wear on hardware, such as storage media, since the quantity of writes to the storage media can be reduced as compared to a full data backup.
In examples where RDMA is used, because the main processors of memory servers are not involved in RDMA data transfers, the main processors of the memory servers do not track modified data portions (also referred to as “dirty” data portions). If the memory servers of the fabric attached memory do not track modified data portions, then an incremental data backup would not be possible since the memory servers are without information of which data portions stored at the memory servers are modified (or dirty) and which are unmodified (or clean).
In accordance with some implementations of the present disclosure, a client-triggered update of a data modification tracking structure at a memory server of a fabric attached memory that is part of a remote access arrangement is performed in response to updates of data pages (or more simply “pages”) by a client over a network to the memory server. A “remote access arrangement” can refer to an arrangement in which a client accesses data stored at a remote memory server that is coupled to the client over a network. A “page” can refer to any specified amount of data (the term “page” is used interchangeably with “data portion”). A page is the data unit at which a data modification tracking structure tracks data modification.
The data modification tracking structure tracks pages that have been modified. The data modification tracking structure is used to perform an incremental backup operation to a backup storage system in which modified pages are copied from the fabric attached memory to the backup storage system from the memory server, but unmodified pages are not copied from the fabric attached memory to the backup storage system.
By using the client-triggered update of a data modification tracking structure, memory servers would not have to be configured with logic to track modified pages. This can reduce the complexity and cost of memory servers of a fabric attached memory.
Examples of client systems include any or some combination of the following: a desktop computer, a notebook computer, a tablet computer, a server computer, a smartphone, a game appliance, an Internet-of-things (IoT) device, and so forth.
A client system is an example of a “client” that can issue a request to perform a remote access of the fabric attached memory 102. A program (including machine-readable instructions) that executes in the client system 108 can be another example of a “client.”
In accordance with some implementations of the present disclosure, the client system 108-1 includes an incremental backup management engine 116. As used here, an “engine” can refer to one or more hardware processing circuits, which can include any or some combination of a microprocessor, a core of a multi-core microprocessor, a microcontroller, a programmable integrated circuit, a programmable gate array, or another hardware processing circuit. Alternatively, an “engine” can refer to a combination of one or more hardware processing circuits and machine-readable instructions (software and/or firmware) executable on the one or more hardware processing circuits.
The incremental backup management engine 116 in the client system 108-1 can perform client-triggered management of modified pages tracking by updating a modified pages tracking structure (discussed further below), and the incremental backup management engine 116 can initiate an incremental data backup from the fabric attached memory 102 over the network 106 to a backup storage system 130 connected to the network 106 using the modified pages tracking structure.
An incremental backup operation from a collection of memory servers 104 can be performed over the network 106 to the backup storage system 130. The backup storage system 130 may be located at a location that is remote from the fabric attached memory 102. The backup storage system 130 includes a backup storage medium 136 to store instances of incremental backup data 138. The backup storage medium 136 can include a collection of storage devices (a single storage device or multiple storage devices).
The instances of incremental backup data 138 can represent copies of backup data obtained at different points in time. For example, a first instance of incremental backup data 138 can include a copy of modified pages at a first time point, a second instance of incremental backup data 138 can include a copy of modified pages at a second time point, and so forth.
The backup storage system 130 includes a storage controller 140 to control access (read access and write access) of the backup storage medium 136. For example, the storage controller 140 can interact with a collection of memory servers 104 to perform an incremental backup operation to produce an instance of incremental backup data 138 that is stored in the backup storage medium 136. The storage controller 140 can also interact with the collection of memory servers 104 to perform a restore operation, to restore data to the collection of memory servers 104 from an instance of the incremental backup data 138.
Each memory server 104 includes a main processor 118 to execute an OS 120 of the memory server 104, as well as to execute other machine-readable instructions, such as firmware (e.g., BIOS code), application programs, and so forth. It is noted that each memory server 104 can include multiple different types of processors (including the main processor 118) to perform respective different functions. The “main processor” can include a single processor or multiple processors.
Each memory server 104 also includes a persistent memory 122 (which is the main memory of the memory server 104) that can store data on behalf of the client systems 108-1 to 108-N. A “persistent” memory refers to a memory that can maintain data stored in the memory even if power is removed from the memory. For example, persistent memory can include nonvolatile memory in which stored data is maintained (i.e., not lost) even when powered off. In another example, persistent memory can refer to an arrangement in which data of a memory is flushed to a backup storage (e.g., disk-based storage, etc.) when power is lost, followed by restoring the data from the backup storage to the memory after power is recovered.
The memory server 104 also includes a network interface 124 that enables the memory server 104 to communicate over the network 106.
In examples where a client system, such as the client system 108-1, is able to perform an RDMA access of the fabric attached memory 102, an RDMA data transfer would cause data to be transmitted from the client system over the network 106 to selected memory servers 104 of the fabric attached memory 102. The data of the RDMA data transfer would pass through the network interface 124 of each of the selected memory servers 104, and is written to the persistent memory 122 of each of the selected memory servers 104, without involving the main processor 118 of each of the selected memory servers 104. In the RDMA data transfer, processing cycles of the main processor 118 of the memory server 104 are not employed for placing the data into the persistent memory 122.
In the example of
Any given bit of the modified pages tracking bitmap 132 can have a first value or a second value different from the first value. For example, the first value can be 1 or 0, while the second value can be 0 or 1 (that is the opposite of the first value). More specifically, if a bit has a value 1 (an example of the first value), then that indicates a corresponding page is modified. On the other hand, if a bit has a value 0 (an example of the second value), then that indicates a corresponding page is unmodified. If the given bit of the modified pages tracking bitmap 132 has the first value, then that indicates that the corresponding page 134 is modified. On the other hand, if the given bit of the modified pages tracking bitmap 132 has the second value, then that indicates that the corresponding page 134 is unmodified. A page being “modified” refers to the page having been changed relative to a copy of the page stored at the backup storage system 130.
In other examples, other forms of modified pages tracking structures can be employed instead of the modified pages tracking bitmaps 132. More generally, a modified pages tracking structure includes a collection of entries, where each entry can have a first value to indicate that a corresponding page is modified, and a second value to indicate that the corresponding page is unmodified. Thus, a first entry of the modified pages tracking structure can indicate whether or not a first page is modified, a second entry of the modified pages tracking structure can indicate whether or not a second page is modified, and so forth.
The client system 108-1 receives (at 202) a write request to write data to the fabric attached memory 102. In the example of
In response to the Put( ) request, the client system 108-1, and more specifically the incremental backup management engine 116, determines (at 204) a location of the modified pages tracking bitmap 132 in the memory server 104. The location of the modified pages tracking bitmap 132 can be indicated by a memory address of a location in the persistent memory 122, or a location in another memory in the memory server 104. In some examples, the incremental backup management engine 116 can determine the location of the modified pages tracking bitmap 132 based on configuration information stored in the client system 108-1, or based on information retrieved from a source external of the client system 108-1.
The client system 108-1 determines (at 206) which page(s) in the persistent memory 122 of the memory server 104 is (are) involved in the write operation specified by the Put( ) request.
In some examples, the client system 108-1 can map memory addresses of the persistent memory 122 of the memory server 104 to respective page numbers. For example, since the size of each page is predefined, the page contains data in a respective range of memory addresses. Given the memory address specified by FAM_Location, the client system 108-1 can identify which page includes the memory address specified by FAM_Location. The identified page has a page number that refers to the identified page. In this manner, the client system 108-1 can identify the starting page number of the starting page that contains the memory address specified by FAM_Location.
Also, once the starting page is identified, the client system 108-1 can determine, based on the size of the write data specified by Data_Size, whether the write data would be written to just the starting page or to a collection of the starting page and other page(s). The size of the write data is such that the write data is to be written to the starting page plus other page(s), then the pages involved in the write operation would include the starting page and the successive page(s) following the starting page.
The incremental backup management engine 116 in the client system 108-1 then issues (at 208) a remote memory access (RMA) scatter operation (at 208) to the fabric attached memory 102. The RMA scatter operation performs both a data update for the write request (to update the page(s) determined at 206 as being involved in the write) and the bitmap update of the modified pages tracking bitmap 132 in the memory server 104. An RMA scatter operation refers to one operation in which updates of different data structures at a fabric attached memory are initiated by a remote entity, in this case the client system 108-1.
In the example of
In some examples, the RMA scatter operation is according to RDMA, in which an update of data (including the page(s) and the modified pages tracking bitmap 132) in the persistent memory 122 of the memory server 104 is performed without involving the main processor 118 of the memory server 104.
In response to the RMA scatter operation, the memory server 104 updates (at 210) the page(s) identified by the page number(s) specified by the RMA scatter request, and the memory server 104 updates (at 212) corresponding bit(s) of the modified pages tracking bitmap 132. The updated bit(s) of the modified pages tracking bitmap 132 reflect the modified state of the page(s) identified by the page number(s) specified by the RMA scatter request.
In an example, if a single page is updated in response to the RMA scatter operation, then a single bit corresponding to the single page in the modified pages tracking bitmap 132 is changed from the second value (e.g., 0 indicating unmodified) to the first value (e.g., 1 indicating modified). On the other hand, if multiple pages are modified in response to the RMA scatter operation, then multiple bits corresponding to the multiple pages in the modified pages tracking bitmap 132 are changed from the second value to the first value.
The client system waits (at 214) for completion of the RMA scatter operation. In some examples, the memory server 104 can return a completion indication to the client system 108-1 when the page(s) and the modified pages tracking bitmap 132 have been updated according to the RMA scatter operation.
In response to receiving the completion indication from the memory server 104, the client system 108-1 returns (at 216) a completion indication to the requester that sent the Put( ) request. The requester can be in the client system 108-1 or external of the client system 108-1.
The client system 108-1 receives (at 302) a data backup request perform a data backup from the fabric attached memory 102 to the storage system 130. In the example of
The parameter FAM_Region in the Backup(FAM_Region) request specifies a memory region of the fabric attached memory 102. A memory region can be a portion of the fabric attached memory 102 that is allocated to store a data item (or multiple data items). A “data item” can refer to a file, an object, a collection of files or objects, or any other identifiable unit of data. The parameter FAM_Region can have an identification value that identifies the memory region. Different memory regions of the fabric attached memory 102 are assigned different identification values (e.g., different numbers, different character strings, etc.).
A memory region can be distributed across multiple memory servers 104 (i.e., data of the memory region is stored in multiple persistent memories 122 of the memory servers 104). Note that the parameter FAM_Location in the Put( ) request (received at 202 in
In response to the Backup(FAM_Region) request, the incremental backup management engine 116 of the client system 108-1 sends (at 304) a backup request (e.g., FAM_Backup(FAM_Region) request).
In response to the FAM_Backup(FAM_Region) request, the memory server 104 locks (at 306) the memory region identified by FAM_Region. Locking the memory server 104 refers to preventing any further data updates in the memory region, i.e., new data cannot be written to the memory region, and no data in the memory region can be modified.
In some examples, locking the memory region identified by FAM_Region includes unregistering (at 308) the memory region identified by FAM_Region. The memory server 104 may maintain a registered memory region data structure (e.g., a list or other type of data structure) storing memory regions that have been allocated to store data on behalf of client systems. Once a new memory region is allocated and ready to accept data updates, the new memory region is “registered” with the memory server 104 by adding an identification value that identifies the new memory region to the registered memory region data structure. Unregistering a given memory region can refer to removing the identification value for the given memory region from the registered memory region data structure. An unregistered memory region of the fabric attached memory 102 is inaccessible over the network 106 for updates.
While the memory region identified by FAM_Region is locked, any writes to the memory region will be redirected (at 310) to a journal 142 in the memory server 104. In other words, in response to a write request to the locked memory region, the write request is redirected to the journal 142, and no write is performed to the locked memory region.
In some examples, as shown in
A “journal” can refer to any data structure that is to hold write requests that cannot be processed with respect to a memory region due to the memory region being locked.
In other examples, instead of unregistering a memory region of the fabric attached memory 102 to lock the memory region, a different locking mechanism can be used. For example, each memory region may be associated with a lock indicator (e.g., flag) that if set to a first value indicates that the memory region is locked and if set to a different second value indicates that the memory is not locked.
After locking (at 306) the memory region, the memory server 104 returns (at 312) a backup in progress indication to the client system 108-1, to indicate that the data backup is in progress.
The memory server 104 performs (at 314) an incremental backup loop in which the memory server 104 iteratively copies modified pages of the memory server 104 to the backup storage system 130. The memory server 104 identifies (at 316), based on the modified pages tracking bitmap 132, a page that has been modified (i.e., the corresponding bit for the page has the first value). The memory server 104 then copies (at 318) the identified modified page to the backup storage system 130. After the page is copied to the backup storage system 130, the memory server 104 resets (at 320) the corresponding bit in the modified pages tracking bitmap 132, which changes the corresponding bit from the first value to the second value to indicate that the page is unmodified.
The memory server 104 then performs another iteration to identify another modified page to backup to the backup storage system 130, and continues the incremental backup loop 314 until all modified pages identified by the modified pages tracking bitmap 132 have been backed up.
In some examples, it is noted that the modified pages are not individually written in distinct I/O operations to the backup storage system 130; rather, one I/O operation can write the modified pages identified in the incremental backup loop 314 to the backup storage system 130.
Upon completion of the incremental backup loop 314 (i.e., after copying all modified pages identified by the modified pages tracking bitmap 132 to the backup storage system 130), the memory server 104 accesses (at 322) the journal 142, and plays back the write requests in the journal 142 to complete the I/O operations specified by the write requests. The write requests played back from the journal 142 cause writes of corresponding data to the memory region (that still remains locked for write requests received from client systems).
After all the write requests in the journal 142 have been played back, the memory server 104 unlocks (at 324) the locked memory region (identified by FAM_Region), to enable for updates of the unlocked memory region in response to write requests from client systems. Unlocking the memory region can include registering the memory region with the registered memory region data structure.
In some examples, a client system or a requester that is part of or is coupled to the client system can check the status information of all previously issued backups using a backup status application programming interface (API). For example, the client system or requester can send a status inquiry including the identification value of a memory region to the backup status API, which returns information pertaining to when incremental backups were performed for the memory region, and information identifying each incremental backup instance. Such information can be used to perform a restore operation, for example, in case of data loss or data corruption at the client system or requester.
The machine-readable instructions include data modification tracking structure update instructions 402 to, in response to a request to modify a first data page at a memory server in a remote access by a client over a network, send, from the client system to the memory server, a request to update a data modification tracking structure stored by the memory server to indicate that the first data page is modified. In some examples, the data modification tracking structure includes the modified pages tracking bitmap 132 of
In some examples, the machine-readable instructions receive a write request to write data to a fabric attached memory including the memory server, and in response to the write request, the machine-readable instructions determine a collection of data pages (one data page or multiple data pages) involved in a write operation specified by the write request, and issue an I/O scatter operation (e.g., the RMA scatter operation issued at 208 in
In some examples, the issuance of the I/O scatter operation includes sending a scatter request that includes information identifying the collection of data pages (e.g., page numbers and/or memory addresses of pages) and a memory address of the data modification tracking structure.
The machine-readable instructions include incremental data backup initiation instructions 404 to initiate, by the client system, an incremental data backup from the memory server to a backup storage system of data pages indicated as modified by the data modification tracking structure stored at the memory server.
In some examples, the memory server is part of a fabric attached memory including a plurality of memory servers, where the incremental data backup is of data pages from multiple memory servers of the plurality of memory servers to the backup storage system.
In some examples, each memory server of the multiple memory servers includes a respective data modification tracking structure indicating which data pages are modified, where the incremental data backup is of the data pages from the multiple memory servers to the backup storage system identified as modified by the data modification tracking structures in the multiple memory servers.
In some examples, the request to modify is to perform a modification of the first data page according to a RDMA write by the client system of a memory of the memory server over the network.
In some examples, the update of the data modification tracking structure is according to a RDMA write by the client system of a memory of the memory server over the network.
The fabric attached memory system 502 includes a memory server 504 including a memory 506 to store a data modification tracking structure 508.
The fabric attached memory system 502 includes a network interface 510 to communicate over a network with a client system. The network interface 510 can receive, from the client system, an update request to update the data modification tracking structure 508. The update request can be an RDMA write of the data modification tracking structure 508 in the memory 506. The update request is responsive to a write request to write data from the client system to the memory server 504, where the update request is to cause a modification of the data modification tracking structure 508 to indicate that a collection of data pages is modified responsive to the write request.
The memory server 504 includes a processor 512 (e.g., the main processor 118 of
The tasks include a backup request reception task 514 to receive a backup request from the client system. The tasks include an incremental data backup task 516 to, in response to the backup request, identify, based on the data modification tracking structure 508, the collection of data pages that has been modified, and perform an incremental data backup of the collection of data pages from the memory server 504 to a backup storage system.
In some examples, the update request includes an RDMA write of the memory 506 of the memory server 504 that updates the data modification tracking structure 508.
In some examples, the update request is part of an I/O scatter operation that is to update the data modification tracking structure 508 and the collection of data pages.
In some examples, the processor 512 is to, in response to the backup request from the client system, lock a memory region in the memory 506 of the memory server 504 to prevent updates of the memory region by further write requests from the client system, and redirect the further write requests to a journal in the memory server 504.
In some examples, the processor 512 is to, after a completion of the incremental data backup, play back the further write requests from the journal to update the memory region according to the further write requests.
In some examples, the processor 512 is to, after playing back the further write requests from the journal, unlock the memory region.
The process 600 includes, in response to the write request, sending (at 604), from the client system to a memory server in the fabric attached memory, a request to update a data modification tracking structure stored by the memory server to indicate that a collection of data pages is modified by the write request. In some examples, the request to update data modification tracking structure is part of a scatter request that identifies both data pages to be updated and the data modification tracking structure to be updated.
The process 600 includes initiating (at 606), by the client system, an incremental data backup from the memory server to a backup storage system of the collection of data pages indicated as modified by the data modification tracking structure stored at the memory server.
In the present disclosure, use of the term “a,” “an,” or “the” is intended to include the plural forms as well, unless the context clearly indicates otherwise. Also, the term “includes,” “including,” “comprises,” “comprising,” “have,” or “having” when used in this disclosure specifies the presence of the stated elements, but do not preclude the presence or addition of other elements.
A storage medium (e.g., 400 in
In the foregoing description, numerous details are set forth to provide an understanding of the subject disclosed herein. However, implementations may be practiced without some of these details. Other implementations may include modifications and variations from the details discussed above. It is intended that the appended claims cover such modifications and variations.
Number | Name | Date | Kind |
---|---|---|---|
5157765 | Birk et al. | Oct 1992 | A |
5375235 | Berry et al. | Dec 1994 | A |
5563987 | Scott | Oct 1996 | A |
5594860 | Gauthier | Jan 1997 | A |
7197516 | Hipp et al. | Mar 2007 | B1 |
7420699 | Farrell et al. | Sep 2008 | B2 |
8095723 | Kim et al. | Jan 2012 | B2 |
8631203 | Fortin et al. | Jan 2014 | B2 |
10353780 | O'Connell et al. | Jul 2019 | B1 |
10761941 | Redko | Sep 2020 | B1 |
10929238 | Horowitz et al. | Feb 2021 | B2 |
20150113019 | Jiang et al. | Apr 2015 | A1 |
20180039434 | Balcha | Feb 2018 | A1 |
20190324868 | Shivanna et al. | Oct 2019 | A1 |
20210271558 | Kumarasamy | Sep 2021 | A1 |
20210271563 | Appireddygari Venkataramana | Sep 2021 | A1 |
20220404984 | Dornemann | Dec 2022 | A1 |
Number | Date | Country |
---|---|---|
2028701 | May 1991 | CA |
103713972 | Apr 2014 | CN |
0432896 | Jun 1991 | EP |
Entry |
---|
Hewlett Packard Enterprise, OpenFAM Reference Implementation, Introduction, May 30, 2023 (4 pages). |
Hewlett Packard Enterprise, OpenFAM Reference Implementation, openfam:fam::fam_map, downloaded Jul. 10, 2023 (3 pages). |
Morgan, Timothy Prickett, The Next Platform, The Future of System Memory is Mostly CXL, Jul. 5, 2022 (15 pages). |
Smith, Ryan, CXL Compute Express Link, Compute Express Link (CXL) 3.0 Announced: Doubled Speeds and Flexible Fabrics, Aug. 2, 2022 (8 pages). |
“OpenFAM Reference Implementation”, A library for programming Fabric-Attached Memory, Aug. 30, 2021, 2 pages. |
CXL Compute Express Link: The Breakthrough CPU-to-Device Interconnect, Retrieved Aug. 4, 2022, 2 Pgs. |
Pfister, “An Introduction to the InfiniBand™ Architecture”, Chapter 42, IBM Enterprise Server Group, Aug. 4, 2022, pp. 617-632. |
Sensi et al., “An In-Depth Analysis of the Slingshot Interconnect”, Aug. 20, 2020, 13 pages. |
Vasavada et al., “Comparing Different Approaches for Incremental Checkpointing: The Showdown”, 2011, 4 Pgs. |
Wikipedia, “Remote direct memory access”, available online at <https://en.wikipedia.org/w/index.php?title=Remote_direct_memory_access&oldid=1093871583>, Jun. 19, 2022, 3 pages. |
Number | Date | Country | |
---|---|---|---|
20240036883 A1 | Feb 2024 | US |