A computer system may sometimes have a need to evacuate the content of a block of memory. For example, when it is suspected that a block of memory may be defective, it is desirable to remove that block of memory from use. As another example, in a computer with multiple partitions, it may be useful to assign a block of memory from one partition to another partition (for load balancing, for example). Since the block of memory to be removed or reassigned may be accessed by executing processes and/or devices, it is necessary to properly evacuate the content of the memory block so that such executing processes and/or devices can continue with minimal disruption vis-à-vis a new memory block. Proper evacuation is also important to avoid conflicts between the evacuation operation and any pending or new direct memory access (DMA) operation involving the memory block since one of the most challenging use scenarios involves evacuating a memory block that is currently in use for DMA accesses by I/O devices.
To facilitate discussion,
The outstanding I/O DMA request is then queued in queue 130 to be serviced when DMA resources become available. When DMA resources become available, the DMA request made by I/O driver 110 is serviced, resulting in an access to memory block 124.
Thus, the copy_page( ) operation first determines whether there exists another memory page into which the content of memory page 242 may be evacuated. In the present example, memory page 252 is selected to be the memory page into which the content of memory page 242 is copied.
Suppose the copy_page( ) operation next begins to copy data from the source memory page (e.g., memory page 242) to the destination memory page (e.g., memory page 252). Shortly thereafter, I/O device 210 happens to want to write to memory page 242 using DMA. If the write operation is performed after some of the content of memory page 242 is in the process of being moved to the target memory page 252, it is possible that the content transferred to target memory page 252 does not contain the most up-to-date data written to memory page 242 during DMA accesses on behalf of I/O device 210. This may happen if, for example, the DMA write operation occurs to a part of memory page 242 that has recently been copied to target memory page 252.
Because of the potential for data corruption and other issues, there has been a reluctance to permit memory evacuation, particularly kernel memory evacuation, while DMA is enabled. One way of synchronizing memory evacuation and DMA involves suspending all DMA activities until evacuation is completed. However, this approach is disruptive and is not desirable from a performance standpoint.
The invention relates, in an embodiment, to a computer-implemented method for performing an evacuation request pertaining to a set of memory pages. The method includes inhibiting new DMA operations on a range of memory, the range of memory overlaps with at least a first portion of the set of memory pages associated with the evacuation request. The method further includes deferring evacuating the set of memory pages pursuant to the evacuation request until all existing DMA requests that pertain to at least a second portion of the set memory pages are drained. The method additionally includes performing the evacuating after the draining is completed for the all existing DMA requests. The method also includes enabling the new DMA operations after the performing the evacuating is completed.
In another embodiment, the invention relates to a computer-implemented method in a computer system for synchronizing memory evacuation requests and direct memory access (DMA) requests with respect to a block of physical memory. The method includes receiving an evacuation request for evacuating a set of memory pages, the set of memory pages including at least a page of memory in the block of physical memory. The method also includes inhibiting new DMA operations on a range of physical memory, the range of physical memory overlaps with at least a portion of the set of memory pages associated with the evacuation request. The method additionally includes draining existing DMA requests that pertain to the set memory pages. The method also includes performing the evacuating after the draining is completed, and enabling the new DMA operations after the performing the evacuating is completed.
In yet another embodiment, the invention relates to an article of manufacture comprising a program storage medium having computer readable code embodied therein. The computer readable code is configured to perform an evacuation request pertaining to a set of memory pages. The article of manufacture includes computer readable code for inhibiting new DMA operations on a range of memory, the range of memory overlaps with at least a first portion of the set of memory pages associated with the evacuation request. The article of manufacture also includes computer readable code for deferring evacuating the set of memory pages pursuant to the evacuation request until all existing DMA requests that pertain to at least a second portion of the set memory pages are drained. The article of manufacture further includes computer readable code for performing the evacuating after the draining is completed for the all existing DMA requests. The article of manufacture additionally includes computer readable code for enabling the new DMA operations after the performing the evacuating is completed.
These and other features of the present invention will be described in more detail below in the detailed description of the invention and in conjunction with the following figures.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
The present invention will now be described in detail with reference to a few embodiments thereof as illustrated in the accompanying drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art, that the present invention may be practiced without some or all of these specific details. In other instances, well known process steps and/or structures have not been described in detail in order to not unnecessarily obscure the present invention.
Various embodiments are described hereinbelow, including methods and techniques. It should be kept in mind that the invention might also cover articles of manufacture that includes a computer readable medium on which computer-readable instructions for carrying out embodiments of the inventive technique are stored. The computer readable medium may include, for example, semiconductor, magnetic, opto-magnetic, optical, or other forms of computer readable medium for storing computer readable code. Further, the invention may also cover apparatuses for practicing embodiments of the invention. Such apparatus may include circuits, dedicated and/or programmable, to carry out tasks pertaining to embodiments of the invention. Examples of such apparatus include a general-purpose computer and/or a dedicated computing device when appropriately programmed and may include a combination of a computer/computing device and dedicated/programmable circuits adapted for the various tasks pertaining to embodiments of the invention.
The invention relates, in an embodiment, to techniques and arrangements for synchronizing evacuation requests and DMA operations pertaining to a block of physical memory. In an embodiment, synchronization is performed in a manner that is substantially transparent (i.e., does not substantially impact) DMA operations and/or evacuation requests involving other blocks of memory in the system. Furthermore, embodiments of the invention enable such synchronization without requiring substantial changes to existing I/O architecture and/or kernel modifications and/or driver modifications.
In an embodiment, an evacuation request pertaining to a block of memory causes the kernel (e.g., the I/O subsystem) to inhibit new DMA operations on a range of memory that at least includes the block of memory associated with the evacuation request. For example, an evacuation request pertaining to a particular page of memory will inhibit new DMA operations at least to that page of memory.
Furthermore, existing DMA operations that involve the block of memory associated with the evacuation request are drained, i.e., all are allowed to complete. After all DMA operations that involve the block of memory associated with the evacuation request are drained, the block of memory is evacuated pursuant to the evacuation request.
If new DMA requests pertaining to the block of memory associated with the evacuation request are received before evacuation is completed, these DMA requests are queued up in the I/O request queue, waiting to be serviced. Once the evacuation is complete, the queued DMA requests are allowed to execute with respect to the block of memory that is formerly the subject of the evacuation request.
Since only DMA requests targeting the same block of memory as that associated with the evacuation request are inhibited, other DMA requests may proceed normally and are thus substantially unaffected. Further, while existing DMA requests pertaining to the same block of memory as that associated with the evacuation request are drained, other DMA requests are also allowed to proceed substantially unaffected. Thus the number of DMA requests that are potentially affected is limited, thereby limiting impact on system performance.
In accordance with embodiments of the invention, the mechanism for synchronization employs existing virtual memory, I/O, and driver arrangements with only minor modifications. In an embodiment, the modifications involve including in the definition of DMA resources the availability status of a range of memory block and having the virtual memory subsystem inform the I/O subsystem of the identity of the physical memory blocks that are currently affected by an evacuation request.
In an embodiment, after the virtual memory subsystem receives an evacuation request that maps to a particular physical memory block, the virtual memory subsystem may pass this information, which at least includes the identity or address of the affected physical memory block, to the I/O subsystem. If a new DMA request inquires the I/O subsystem whether DMA resources are available, that I/O subsystem would respond negatively if the new DMA request involves a memory block that has been noted by the virtual memory subsystem (and communicated to the I/O subsystem) as being concurrently involved with an evacuation request. Accordingly, the new DMA request is deferred, or queued, waiting for DMA resources to become available.
Once the evacuation request pertaining to that memory block is completed, the virtual memory subsystem may again inform the I/O subsystem of the completion of the drain operation(s). The I/O subsystem may then deem the DMA resources requested by the now-deferred DMA requests as “available.” The availability of the required DMA resources (as notified by the I/O subsystem) enables the deferred DMA requests to be performed.
In this manner, embodiments of the invention synchronize evacuation requests for a block of memory with DMA operations. Since DMA operations are inhibited only with respect to a limited range of physical memory and only until the evacuation operation is completed, the impact on system performance is limited. Furthermore, since the mechanism involved with synchronization in accordance with embodiments of the present invention employs existing I/O, virtual memory, and driver architectures of the operating system with only minor modifications, migration to the features offered by embodiments of the invention is simplified.
The features and advantages of the present invention may be better understood with reference to the figures and discussions that follow.
In step 304, new DMA operations involving a range of physical memory that includes at least the set of memory pages associated with the evacuation request are inhibited. As mentioned, the inhibiting mechanism may involve including the availability status of the set of memory pages as part of the availability status of DMA resources required to service the DMA request. Memory pages associated with a pending evacuation request are deemed “unavailable” until the pending evacuation request is completed.
In an embodiment, the I/O subsystem is responsible for determining whether DMA resources are available to service a given DMA request. By having the virtual memory subsystem inform the I/O subsystem of the existence of a pending evacuation request, along with the memory pages affected, the I/O subsystem may deem DMA resources (which now includes the availability status of a range of memory) available or unavailable for new DMA requests. Note that DMA requests involving DMA operations that do not involve the memory pages that are DMA-inhibited may continue to occur substantially unaffected.
In step 306, existing DMA requests that pertain to the set of memory pages involved in the evacuation request are drained. As mentioned, with reference to the example of
In step 308, the evacuation operation is allowed to take place after the existing DMA requests are drained. In an embodiment, once the evacuation operation is complete, the set of memory pages associated with the now-completed evacuation operation is deemed available by the I/O subsystem (with notification from the virtual memory subsystem), which in turn causes the I/O subsystem to deem the DMA resources available for any pending DMA requests that involve the same set of memory pages. Accordingly, DMA requests pertaining to the same set of memory pages may proceed (step 310).
If the DMA resources are unavailable (as determined in step 404), the driver places the DMA request into an I/O queue to wait until such time that the DMA resources become available (step 406).
On the other hand, if the DMA resources are available (as determined in step 404), the DMA operation associated therewith is allowed to occur (step 408).
Suppose the DMA resources are unavailable and the DMA request becomes a deferred DMA request and pending in the I/O queue in accordance with step 406. At some point in time, the evacuation operation is completed, and the I/O subsystem determines that the DMA resources are now available (step 420). In step 422, the kernel employs the driver callback function to invoke the driver associated with the deferred DMA request, causing the DMA request to be de-queued from the I/O queue for execution (step 424). Thereafter, the DMA operation associated with previously deferred I/O request is permitted to occur (arrow 426 to step 408).
In step 434, a notification of the completion of the evacuation operation is received. In step 436, the range of memory associated with the formerly pending evacuation request is now deemed available for DMA operation, which may cause the DMA resources to be deemed available to a pending DMA request if other aspects of the DMA resources are also available.
While the I/O requests whose range of memory overlaps the range of memory of the evacuation request are waiting to be drained, other I/O requests may continue as normal (step 506). In step 508, it is ascertained whether all I/O requests whose range of memory overlaps the range of memory associated with the evacuation request have been drained. If they all have been drained, the evacuation operation may begin. In an embodiment, a drain complete event is generated and sent to the virtual memory subsystem to enable the evacuation operation to begin.
As can be appreciated from the foregoing, embodiments of the invention enable DMA operations and evacuations to be synchronized with respect to a block of physical memory in a manner that causes little impact to system performance. Since DMA operations are inhibited only with respect to a limited range of physical memory and only until the evacuation operation is completed, the impact on system performance is limited. Furthermore, since the mechanism involved with synchronizing in accordance with embodiments of the present invention employ existing I/O, virtual memory, and driver OS architectures with only minor modifications, the synchronization capability may be provided without requiring complex I/O hardware specific solutions or substantial changes to the current hardware and/or software of existing computer systems.
While this invention has been described in terms of several embodiments, there are alterations, permutations, and equivalents, which fall within the scope of this invention. It should also be noted that there are many alternative ways of implementing the methods and apparatuses of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, permutations, and equivalents as fall within the true spirit and scope of the present invention.