The present invention relates generally to storage systems and more particularly to catering to I/O operations accessing storage systems.
When reconstructing or building a storage unit, intensive I/O activity can occur concurrently. Typically, a predefined scheme for the building process is imposed, so that the building process can be controlled, monitored and run efficiently. A common approach is to retrieve data segments of a predefined size (or number of blocks) and in a predetermined sequence, so that the size of each segment is known and the sequence of retrieving the segments is also known.
The disclosures of all publications and patent documents mentioned in the specification, and of the publications and patent documents cited therein directly or indirectly, are hereby incorporated by reference.
Certain embodiments of the present invention seek to provide copying e.g. for reconstruction with concurrent support of ongoing I/O operations during the building process. If, during the build and implementation of the sequential segment retrieval process, an I/O request is received for one or more data blocks which are mapped (or otherwise associated) with the storage unit being reconstructed, which blocks are yet to be stored on the storage unit, the sequential scheme is overridden and a segment which contains the requested blocks is promoted to the head of the queue or otherwise, e.g. in pointer-based implementations, given top priority, and is thus retrieved and stored on the storage unit ahead of segments located in front of this segment according to the original scheme. If the I/O involves blocks located in two or more different segments, the override may be implemented for each one of the two or more segments e.g. according to their internal order.
Certain embodiments of the present invention include copying a sequence of data from an intact storage module to an additional storage module which is being used not only after copying has been completed but as copying proceeds and before it has been completed. The data is served up, subdivided into “chunks”, from the intact storage module. Typically, it is more efficient for the chunks to be served up, and received in sequence i.e. in a sequence that preserves the sequence of the data. Copying may occur in order to recover data in a damaged destination device by copying the same data or data which enables computation of the same data, from an intact source device. Copying typically preserves the sequence of the data which may be a physical sequence in which the data is stored and may be a logical sequence which differs from the physical sequence. If data is stored in a physical sequence which differs from the logical sequence then typically, the relationship between the physical and logical orders is stored, e.g. in a suitable controller.
Certain embodiments of the present invention describe a method for reading and writing data in order to rebuild the image in a new host, e.g. to recover a failed solid state based storage system, to a solid state based storage system. When recovering data to a spare solid state based storage system, the data may be read from the secondary solid state based storage system which in turn reads the data from the non-volatile memory. The data might already reside in the secondary non-volatile memory but this is rare.
When rebuilding the new (spare) solid state based storage system the substantially permanent data that is to be stored in the new solid state based storage system in normal operations is typically regenerated. Thus, as part of the recovery, the spare solid state based storage system goes from one end of its storage space to the other and reads the data from the secondary system, which in turn reads that data from the non-volatile storage. Though the process may read the data block by block, for optimal utilization of the bandwidth of the network, the read operations are done in larger chunks, each comprising many blocks, where the term “block” refers to a basic unit of storage which may be employed by the storage device itself and its handlers. For example, data may be written into certain storage devices only block by block and not, for example, bit by bit.
While reading the data, I/O operations continue to be accommodated by the system. Some of these operations (reads and writes) address the data that should reside in the spare system being rebuilt. If the data is not already in the spare system being rebuilt, the spare system may do the following: If the command is a read, the system reads the entire chunk around that location. The chunk is the same chunk that would have been read in the sequential order that would have contained the data in that read operation. If the data spans more than one chunk, the spare system typically reads all the spanned chunks prior to replying. In case of a write command, the spare system first submits a read to the same locations, then reads the relevant chunks, and finally writes the data as requested.
Certain embodiments of the present invention include a method of reading and writing data from one storage source while that storage source is being loaded with the data to be read. Typically, there is a plurality of storage sources S1 to Sn. Assume that one of the sources, say S1, is being built or restored from one or many of the other storage resources S2 through Sn. During the copying process from the plurality of sources to S1, S1 is being accessed for READ and/or WRITE purposes.
The process of copying the data to a storage resource S1, in accordance with certain embodiments of the present invention, is as follows. S1 is divided into sequential chunks of memory M1 through Mk. These chunks may be of the same or different sizes. The chunks are then ordered e.g. as per their physical order in S1 or as per their logical order for the host/s. A table S is provided, where for each chunk Mi, S[i] points to the location in some Sj where Mi resides. In further embodiments one chunk Mi may reside in a plurality of storage resources of type Sj. An ordinarily skilled man of the art can easily transform a read from a single Sj into a read operation from a plurality of Sj's where the chunk Mi resides.
The copying process C may be as follows. A table T of size k is provided where each entry T[i] corresponds with one of the memory chunks M1 though Mk. All entries are initially marked as “not copied”. The process goes over the chunks, in the defined order, using an index INDX to denote the chunk MINDX currently being copied. Initially, INDX=1 points to chunk M1. C checks the entry T[1] in table T. If it is marked as “copied”, C advances the value INDX by 1. If the entry T[1] is marked not copied, it identifies M1's location in the plurality of resources S2 through Sn using entry S[1] in table S, reads M1 and writes it to the storage entity S1. C marks the chunk M1 as “copied” in T[1] and advances the value of INDX by 1. C then turns to the next chunk pointed at by INDX, namely M2, and repeats the process. This continues until all entries in table T have been marked as copied.
In some embodiments of the invention, there may be a plurality of typically concurrent copying processes C1 to Cj, each one responsible for a subset of the memory chunks M1 through Mk. During the copy process there may be READ and WRITE operations targeted against the storage resource S1. There may be an I/O process handling these requests, one of which may pertain to a segment of memory Q. Segment Q may be located in one of the chunks Mi or it may span several chunks. If the segment Q is located in a set of chunks which was already copied to S1, the I/O operation becomes a regular operation which requires no special attention. However, if the segment Q is located, partially or entirely, in a set of chunks which has not yet been copied, then the process Q requests the process C to copy the set of chunks that are related to the segment Q. Responsively, process C finishes copying the current chunk at INDX, creates a new temporary index INDX′ and sets it to the first chunk to be read to cater for the I/O related to segment Q. Process C then reads the sequence of memory chunks pertaining to Q using INDX′ in the same manner that it uses INDX and marks the table T accordingly. Once the copy for the subset is done, the I/O process can continue the I/O operation (READ or WRITE) and the copy process goes back to location denoted by INDX and continues until the end. In some embodiments INDX′ can be initialized as the first chunk not read as of yet. In the event that the process C reaches a location which was already copied—this is evident by the table T, C typically continues to the next not-yet-copied location, without attempting to re-copy data already copied for I/O purposes.
In some embodiments of the invention, in order to ensure that the recovery process ends, the priority of the copying process C over that of the I/O may be increased, e.g. for some predetermined duration, to get a predetermined amount or proportion of the still undone copying done.
There is thus provided, in accordance with at least one embodiment of the present invention, a method for copying data as stored in at least one source storage entities, the method comprising copying data from a source storage entity into a destination storage entity and catering to at least one I/O operation directed toward the source storage entity during copying, the copying including reading at least one chunk of data in a predetermined order; and reading, responsive to a request, at least one relevant chunk containing data related to at least one I/O operation out of the predetermined order.
Further in accordance with at least one embodiment of the present invention, the method also comprises returning to the predetermined order after reading, responsive to a request, the relevant chunks containing the data related to the operation.
Further in accordance with at least one embodiment of the present invention, the method also comprises prioritizing of catering to I/O operations vis a vis orderly copying of the storage entity and performing the copying and catering step accordingly.
Still further in accordance with at least one embodiment of the present invention, the prioritizing is determined based at least partly on I/O rate.
Additionally in accordance with at least one embodiment of the present invention, the prioritizing includes copying of the storage entity in the predetermined order if the I/O rate is lower than a threshold value.
Further in accordance with at least one embodiment of the present invention, the storage entity to be copied is volatile.
Still further in accordance with at least one embodiment of the present invention, the storage entity to be copied is non-volatile.
Further in accordance with at least one embodiment of the present invention, the source storage entity is volatile.
Additionally in accordance with at least one embodiment of the present invention, the source storage entity is non-volatile.
Further in accordance with at least one embodiment of the present invention, the chunks of data are of equal size.
Additionally in accordance with at least one embodiment of the present invention, the chunks of data each comprise at least one data block.
Still further in accordance with at least one embodiment of the present invention, the chunks of data each comprise at least one hard disk drive track.
Also provided, in accordance with at least one embodiment of the present invention, is a system for copying a storage entity from at least one source storage entity, the system comprising orderly copying apparatus for copying data from a source storage entity including reading chunks of data from the at least one source storage entity in a predetermined order; and I/O request catering apparatus for overriding the orderly copying apparatus, responsive to at least one I/O request, the overriding including reading at least one relevant chunk containing data related to the at least one I/O request, out of the predetermined order.
Further in accordance with at least one embodiment of the present invention, the I/O request catering apparatus is activated to override the orderly copying apparatus when at least one activating criterion holds and also comprising an on-line copying mode indicator operative to select one of a plurality of copying modes defining a plurality of activating criteria respectively according to which the I/O request catering apparatus is activated to override the orderly copying apparatus responsive to the plurality of copying modes having been selected respectively.
Also provided, in accordance with at least one embodiment of the present invention, is a method for managing data copying in a population of storage systems, the method comprising copying at least one first chunk from at least one source storage entity including giving a first priority to orderly copying of data vis a vis out-of-order copying of data responsive to incoming I/O requests; and copying at least one second chunk from at least one source storage entity including giving a second priority, differing from the first priority, to orderly copying of data vis a vis out-of-order copying of data responsive to incoming I/O requests.
Further in accordance with at least one embodiment of the present invention, the first priority comprises zero priority to orderly copying of data such that all copying of data is performed in an order which is determined by data spanned by incoming I/O requests rather than in a predetermined order.
Still further in accordance with at least one embodiment of the present invention, at least one individual I/O request does not result in reading at least one relevant chunk containing data related to the I/O operation out of the predetermined order, if an ongoing criterion for an adequate level of orderly copying of the storage entity is not currently met.
Additionally in accordance with at least one embodiment of the present invention, the overriding including reading less than all relevant chunks not yet copied which contain data related to received I/O requests, out of the predetermined order, wherein the less than all relevant chunks are selected using a logical combination of at least one of the following criteria:
a. chunks containing data related to I/O requests are read out of order only for high priority I/Os as defined by external inputs,
b. chunks containing data related to I/O requests are read out of order only in situations in which a predetermined criterion for background copying has already been accomplished,
c. chunks containing data related to I/O requests are read out of order only for I/O requests which span less than a single chunk,
d. chunks containing data related to I/O requests are read out of order only for I/O requests occurring at least a predetermined time interval after a previous I/O for which I/O requests were read out of order, and
e. chunks containing data related to I/O requests are read out of order only for I/O requests which have accumulated into a “queue” of at least a predetermined number of I/O requests.
Further in accordance with at least one embodiment of the present invention, the overriding including reading all relevant chunks not yet copied which contain data related to all I/O requests, out of the predetermined order.
Still further in accordance with at least one embodiment of the present invention, the reading of at least one chunk does not initiate before the reading responsive to a request.
Additionally in accordance with at least one embodiment of the present invention, the copying comprises recovering lost data.
Further in accordance with at least one embodiment of the present invention, the predetermined order comprises a physical order in which a logical stream of data is stored within the source storage entity.
Also provided is a computer program product, comprising a computer usable medium or computer readable storage medium, typically tangible, having a computer readable program code embodied therein, the computer readable program code adapted to be executed to implement any or all of the methods shown and described herein. It is appreciated that any or all of the computational steps shown and described herein may be computer-implemented. The operations in accordance with the teachings herein may be performed by a computer specially constructed for the desired purposes or by a general purpose computer specially configured for the desired purpose by a computer program stored in a computer readable storage medium.
Any suitable processor, display and input means may be used to process, display, store and accept information, including computer programs, in accordance with some or all of the teachings of the present invention, such as but not limited to a conventional personal computer processor, workstation or other programmable device or computer or electronic computing device, either general-purpose or specifically constructed, for processing; a display screen and/or printer and/or speaker for displaying; machine-readable memory such as optical disks, CDROMs, DVDs, Bluray Disk, magnetic-optical discs or other discs; RAMs, ROMs, EPROMs, EEPROMs, magnetic or optical or other cards, for storing, and keyboard or mouse for accepting. The term “process” as used above is intended to include any type of computation or manipulation or transformation of data represented as physical, e.g. electronic, phenomena which may occur or reside e.g. within registers and/or memories of a computer.
The above devices may communicate via any conventional wired or wireless digital communication means, e.g. via a wired or cellular telephone network or a computer network such as the Internet.
The apparatus of the present invention may include, according to certain embodiments of the invention, machine readable memory containing or otherwise storing a program of instructions which, when executed by the machine, implements some or all of the apparatus, methods, features and functionalities of the invention shown and described herein. Alternatively or in addition, the apparatus of the present invention may include, according to certain embodiments of the invention, a program as above which may be written in any conventional programming language, and optionally a machine for executing the program such as but not limited to a general purpose computer which may optionally be configured or activated in accordance with the teachings of the present invention. Any of the teachings incorporated herein may wherever suitable operate on signals representative of physical objects or substances.
The embodiments referred to above, and other embodiments, are described in detail in the next section.
Any trademark occurring in the text or drawings is the property of its owner and occurs herein merely to explain or illustrate one example of how an embodiment of the invention may be implemented.
Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions, utilizing terms such as, “processing”, “computing”, “estimating”, “selecting”, “ranking”, “grading”, “calculating”, “determining”, “generating”, “reassessing”, “classifying”, “generating”, “producing”, “stereo-matching”, “registering”, “detecting”, “associating”, “superimposing”, “obtaining” or the like, refer to the action and/or processes of a computer or computing system, or processor or similar electronic computing device, that manipulate and/or transform data represented as physical, such as electronic, quantities within the computing system's registers and/or memories, into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices. The term “computer” should be broadly construed to cover any kind of electronic device with data processing capabilities, including, by way of non-limiting example, personal computers, servers, computing system, communication devices, processors (e.g. digital signal processor (DSP), microcontrollers, field programmable gate array (FPGA), application specific integrated circuit (ASIC), etc.) and other electronic computing devices.
The present invention may be described, merely for clarity, in terms of terminology specific to particular programming languages, operating systems, browsers, system versions, individual products, and the like. It will be appreciated that this terminology is intended to convey general principles of operation clearly and briefly, by way of example, and is not intended to limit the scope of the invention to any particular programming language, operating system, browser, system version, or individual product.
Certain embodiments of the present invention are illustrated in the following drawings:
The size of segment S is such that the data therewithin is read chunk by chunk, in K chunks. The chunk size may be based on the media and/or network characteristics and is typically selected to be large enough to make reading out of order worthwhile, given the overhead associated with recovering data in general and out-of-order in particular, to the extent possible given the application-specific level of service which needs to be provided to I/O requests. In one embodiment, the chunk size is equal to or greater than a predefined threshold. The threshold may be fixed or dynamic. For example, the threshold may correspond to an average idle time of the storage system or of any one of the underlying storage units or any subset of the underlying storage units. In another embodiment, an initial chunk size is set and the initial chunk size is modified by a predefined optimization scheme. Each time the chunk size is modified, certain parameters of the system's performance are measured. The best or optimal chunks size is selected and is used during at least a predefined number of chunk reads. Various optimization methods are well known, and may be implemented as part of the present invention, for example, a convergence criteria may be used in the selection of an optimal chunks size.
The reading process is such that it is advantageous to read the chunks in their natural order within the n data sources i.e. first chunk 1, then chunk 2, . . . and finally chunk K. However, if an I/O request requires chunk 17, say, to be serviced, then even if the chunks are being read in order and the next chunk in line is, say, chunk 5, the method may skip to chunk 17 in order to accommodate the I/O request and only subsequently return to its background work of restoring chunks, 5, 6, 7, . . . 16, and then chunks 18, 19, . . . , again unless an additional I/O request is made and it is policy to accommodate it.
A ChunkCopied table T is provided which is initially empty and eventually indicates which chunks have or have not already been copied; this ensures that a next-in-line-to-be-copied chunk, in the background restoration process, is in fact only copied once it has been determined that that very chunk was not copied in the past in order to accommodate a past I/O request.
An index, INDX, running over the entries in the table, is initially 1. In step 120, the method checks whether any I/O request is pending which is to be accommodated even if the reconstruction process needs to be interrupted; either all I/O requests or only some may be accommodated despite the need for interruption of reconstruction, using any suitable rule. If no I/O request is waiting for accommodation, the method checks whether a currently indexed chunk has been copied, by checking whether the INDX-th entry in the table stores the value “copied” or “not copied”. If the currently indexed, i.e. INDX-th, chunk has not yet been copied, the INDX′th chunk, MINDX, is read from the appropriate one (or more than one) of sources S1 to Sn and the INDX-th entry in the table is set to “copied”. Unless the table index INDX has exceeded K, the method then returns to step 120.
If step 120 detects that an I/O request Q, which is to be accommodated, is waiting (yes branch of step 120), chunks Mx to My are identified which are required to fulfill request Q (step 150). The I/O requests a portion of storage and an inclusive set of chunks is identified. For example, if the I/O is from address 330 to 720 and the chunks each include 100 addresses, chunks 3 to 7 are identified as being required to fulfill request Q. The identified chunks are copied one after the other (steps 155-185), typically unless they have previously been copied, and typically without interruption e.g. without consideration of other I/O request that may have accumulated and without concern for the neglected background reconstruction process. Out-of-order copying takes place as per an index INDX′ which is initialized to at least x, as described in further detail below. To ensure that a previously copied requested chunk is not re-copied, the T table is checked (step 160) before copying block INDX′. Optionally, the need to access the T table for each candidate block to be copied is significantly reduced initially, for each I/O request, setting INDX′ at the maximum between x, the index of the first (lowest) requested chunk, and INDX, the index of the next to be copied chunk in the ordered, background copying, thereby obviating the need to check the T table for all blocks copied in the course of ordered, background copying.
The method then returns to step 120. Once all K chunks have been copied (step 145), the method terminates. It is appreciated that due to provision of table T and step 125, a chunk which is next in line in the background restoration process is not necessarily copied since it may be found, in step 125, to have previously been copied, presumably because it was required to accommodate a previous I/O request.
A memory segment S being recovered may for example be 100 GB (Giga Byte) in size. The chunk size may be 1 MB (Mega Byte). In this example, K=S/B=100K. A segment being read could be of any size, such as 10 Mega Bytes, which might span 10 to 12 chunks.
The term “chunk” as used herein refers to an amount of memory read from any type of storage. In HDD (hard disk drive) applications, each chunk might comprise one or more blocks or tracks, each block usually comprising 512 bytes. In RAM applications, each chunk comprises a multiplicity of bytes; since the data travels over a network, the bytes may be expressed in blocks, each of which comprises a fixed number of bytes.
The time required to read a chunk depends on the structure, characteristic and medium of the network interconnecting the storage unit being copied from and the storage unit e.g. memory being copied to, and whether the data is being read from Solid State or HDD (hard disk drive). For example, for an HDD (hard disk drive), reading 10 Megabytes might require between 10 to 80 seconds. For a solid state device, the same reading could require only about 1 msec.
Typically, once a chunk has been requested for background copying purposes, it is processed without interruption even if an I/O request arrives as it is being processed, even if the memory source is technically capable of receiving a cancellation of the request for the chunk. However, alternatively, an I/O request may be accommodated immediately even if a chunk, to be used for background copying purposes, is en route, and the remaining processing of the en route chunk (such as but not limited to requesting anew if the request is cancelled) is taken up only after accommodating the I/O request by requesting all chunks spanned thereby.
As in
Typically, as shown, a CopiedChunks counter is provided in
The 3 policies illustrate include reverting periodically to orderly background copying (step 560), preferring accommodation of accumulated I/O requests if any (step 510—“on demand” policy as in
It is advantageous to provide several policies both because different clients require different policies and because a single client may require different policies at different times. For example, an e-shopping Internet site (or computerized retail outlet management system) may hold periodic “sales” such as a Christmas sale, an Easter sale and a back-to-school sale, which are normally preceded by slow periods in which there are relatively few transactions between the site and its customers e.g. e-shoppers. Just before a sale, the e-shopping site (or retail outlet) may wish to create one or more “mirrors” (copies) of data required to effect a sale, such as price data and inventory data. Therefore, enforced background policy may be appropriate, in order to ensure that the mirrors are finished by the time the sale starts, and if necessary sacrificing quality of service to the relatively few clients active prior to the sale so as to achieve quality of service to the large number of clients expected to visit the site during the sale. During each sale, I/O rate-dependent or even on-demand policy may be appropriate for restoring lost data or for completing mirrors not completed prior to the sale. Between sales, other than just before each sale, I/O rate-dependent policy may be used, however the threshold I/O rate used at these times would typically be much higher than the threshold I/O rate used for I/O rate-dependent copying occurring during a sale.
More generally, the same considerations may apply to any data-driven system which has critical periods, sometimes preceded by slow periods, and normal periods, such as (a) businesses which perform book-keeping routines including a large number of I/O requests, at the end of each financial period or (b) data driven systems having a scheduled maintenance period prior to which relevant data is copied e.g. mirrored. During critical periods, on-demand policy or I/O rate-dependent policy with a low I/O rate threshold may be suitable. In between critical periods, enforced background policy or I/O rate-dependent policy with a high I/O rate threshold may be suitable.
A particular advantage of I/O rate dependent operation is that the usefulness, or lack thereof, of short periods of time for background work vs. the distribution of intervals between I/Os, may be taken into account. It is appreciated that the I/O rate is only a rough estimate of this tradeoff and other embodiments taking this tradeoff more accurately into account are also within the scope of the present invention. For example, a learning phase may be provided in which data is collected and distribution of intervals between I/O's is determined, in order to identify the distribution of and/of frequency of intervals which are long enough to read a single block. This interval depends on the media type and/or network.
If the policy is to prefer accommodation of accumulated I/O requests if any (step 510), the method then determines whether any I/O requests are actually pending (step 515). If none are pending, the output of the method of
If the periodic chunks counter is greater than zero (“yes” option of step 565), indicating that an orderly background copying session is currently in process, the counter is decremented, the time of the most recent orderly background copying session is set to be the current time (step 590), and the output of the method of
If the policy is to prefer one or the other of the first two policies depending on the I/O rate (step 530), the I/O rate is read (step 535) and compared to a ceiling value RLimit (step 540). If I/O requests are pending, or if the I/O rate exceeds the ceiling even if no I/O requests are pending, the “on demand” (only I/O) policy is used (step 515 and 520), the rationale being that with such a high rate of I/O, background copying is not efficient because it is likely to be interrupted too often to allow efficiency to be achieved. Otherwise, i.e. if the I/O rate does not exceed the ceiling and if there are no I/O requests, the method returns a “background” output.
It is appreciated that any suitable control parameter can be used to adjust the tradeoff between orderly background copying and I/O request motivated, out of order copying, such as but not limited to the following:
a. I/O Rate: the rate at which write I/O requests, or all I/O request come in. The system may for example be programmed such that, from a certain rate and upward, the system focuses on catering to the requests rather than to orderly background copying. In the present specification, the term “catering to” an I/O request for data made by a requesting entity refers to supplying the data to that entity.
b. Time since last chunk recovered: if a long time period has elapsed since orderly background copying was indulged in, the priority of background copying may be increased by a predetermined step or proportion, to ensure advancement of background copying. The priority of background copying may be decreased by the same or another predetermined step or proportion, if a large amount of or proportion of background copying seems to have already occurred and/or if indications of distress stemming from inadequate servicing of I/O requests, are received.
c. External request.
Certain of the illustrated embodiments include steps such as steps 120 in
It is appreciated that many variations are possible in implementing the “enforce background” policy of
It is appreciated that the CopiedChunks counter is advantageously provided in embodiments in which no orderly (background) copying is performed or in embodiments in which, for significant periods of time, no orderly (background) copying is performed. Alternatively, the CopiedChunks counter may be provided in all embodiments.
Still with reference to
During a certain period, many I/O operations were received one after the other, which are represented in the drawing, for simplicity, by two I/O operations 930, 937. After this period, an I/O request 952 is received but unlike previous I/O requests is not attended to immediately, because the TNoBG vs. TLimit check 955 determines that TLimit has been reached and therefore, a predetermined number of chunks (3, in the illustrated example) are read or at least dealt with (read or skipped), in order, before any additional I/O requests are catered to.
According to certain embodiments of the present invention, I/O requests are always catered to as soon as the chunk currently being reconstructed, has been completed. However, it is appreciated that this need not be the case; alternatively I/O requests may be accommodated (catered to) only under predetermined circumstances such as but not limited to only for high priority I/O requests as defined by external inputs, only in situations in which most of the background copying has already been accomplished, only I/O requests which span less than a single chunk, only I/O requests occurring at least a predetermined time interval after the previously accommodated I/O, only I/O requests which have accumulated into a “queue” of at least a predetermined number of I/O requests, and so forth. Optionally, if a queue of I/O requests has accumulated, the I/O requests are examined to identify therewithin, “runs” of consecutive chunks, and these chunks may be copied consecutively. For example, if 3 I/O requests have accumulated, the first and earliest received spanning chunks 2-5 (the order being determined by the physical order of chunks in the storage medium), the second spanning chunks 18-19, and the third and most recently received spanning chunks 6-7, then if retrieval in accordance with the physical order in the storage medium is more cost effective than retrieval which is not in accordance with the physical order in the storage medium, chunks 2-7 may be retrieved first, followed by chunks 18-19.
Reference is now made to
A: 1st priority=on demand (priority of orderly copying is zero, copying occurs only responsive to I/O requests).
B: 1st/2nd priorities for 1st/2nd storage entities respectively.
C: apply 1st/2nd priorities to high/low criticality I/O requests respectively. The first priority scheme includes preferring I/O requests to orderly copying always or when I/O rate is high. The second priority scheme includes preferring orderly copying to catering to I/O requests always or when I/O rate is low.
D: apply 1st/2nd priorities during seasons with high/low densities of I/O requests. The 2nd priority comprises use of “ensure copy” (e.g. “enforce copy” or “enforce background”) policies as described above.
E: apply 1st/2nd priorities during seasons with high/low densities of I/O requests; 1st/2nd priorities use I/O rate based policies with 1st low and 2nd high I/O rate thresholds respectively.
A suitable method for managing data copying in a population of storage systems, using the system of
Giving a first priority may for example comprise giving a first priority to orderly copying of data vis a vis out-of-order copying of data responsive to incoming high-criticality I/O requests and wherein the giving a second priority comprises giving a second priority to orderly copying of data vis a vis out-of-order copying of data responsive to incoming low-criticality I/O requests and wherein the first priority is higher than the second priority.
Giving a first priority may also comprise catering to high-criticality I/O requests in preference over background copying in high I/O rate periods, or always.
Giving a second priority may comprise preferring background copying over catering to low-criticality I/O requests, at least in low I/O rate periods, or always.
Giving first priority may occur during a high-I/O-request-density season and giving second priority may occur during a low-I/O-request-density season. Giving a first priority may comprise using an on-demand policy which priorities out-of-order copying exclusively. Giving second priority may comprise using an “ensure copying” policy such as an “enforce copy” policy or an “enforce background” policy.
Giving first priority may occur during a high-I/O-request-density season and may use an I/O rate based policy with a first I/O rate threshold and giving second priority may occur during a low-I/O-request-density season and may use an I/O rate based policy with a second I/O rate threshold higher than the first I/O rate threshold.
Applications of some or all of the embodiments of the present invention include but are not limited to:
a. restoring volatile memory from non-volatile memory, in which case, typically, each chunk comprises a single track in a hard disk;
b. restoring volatile memory from volatile memory; and
c. RAIDed and not RAIDed configurations of memory.
Each of the embodiments shown and described herein may be considered and termed a Solid State Storage module which may, for example, comprise a volatile memory unit combined with other functional units, such as a UPS. The term Solid State Storage module is not intended to be limited to a memory module. It is appreciated that any suitable one of the Solid State Storage modules shown and described herein may be implemented in conjunction with a wide variety of applications including but not limited to applications within the realm of Flash storage technology and applications within the realm of Volatile Memory based storage.
In addition to all aspects of the invention shown and described herein, any conventional improvement of any of the performance, cost and fault tolerance of the solid state storage modules shown and described herein, and/or of the balance between them, may be utilized.
The terms “rebuild”, “reconstruct” and “recover” are used herein generally interchangeably.
It is appreciated that software components of the present invention including programs and data may, if desired, be implemented in ROM (read only memory) form including CD-ROMs, DVDs, BluRay Disks, EPROMs and EEPROMs, or may be stored in any other suitable computer-readable medium such as but not limited to disks of various kinds, cards of various kinds and RAMs. Components described herein as software may, alternatively, be implemented wholly or partly in hardware, if desired, using conventional techniques.
Included in the scope of the present invention, inter alia, are electromagnetic signals carrying computer-readable instructions for performing any or all of the steps of any of the methods shown and described herein, in any suitable order; machine-readable instructions for performing any or all of the steps of any of the methods shown and described herein, in any suitable order; program storage devices readable by machine, tangibly embodying a program of instructions executable by the machine to perform any or all of the steps of any of the methods shown and described herein, in any suitable order; a computer program product comprising a computer useable medium having computer readable program code having embodied therein, and/or including computer readable program code for performing, any or all of the steps of any of the methods shown and described herein, in any suitable order; any technical effects brought about by any or all of the steps of any of the methods shown and described herein, when performed in any suitable order; any suitable apparatus or device or combination of such, programmed to perform, alone or in combination, any or all of the steps of any of the methods shown and described herein, in any suitable order; information storage devices or physical records, such as disks or hard drives, causing a computer or other device to be configured so as to carry out any or all of the steps of any of the methods shown and described herein, in any suitable order; a program pre-stored e.g. in memory or on an information network such as the Internet, before or after being downloaded, which embodies any or all of the steps of any of the methods shown and described herein, in any suitable order, and the method of uploading or downloading such, and a system including server/s and/or client/s for using such; and hardware which performs any or all of the steps of any of the methods shown and described herein, in any suitable order, either alone or in conjunction with software.
Features of the present invention which are described in the context of separate embodiments may also be provided in combination in a single embodiment. Conversely, features of the invention, including method steps, which are described for brevity in the context of a single embodiment or in a certain order may be provided separately or in any suitable subcombination or in a different order. “e.g.” is used herein in the sense of a specific example which is not intended to be limiting. Devices, apparatus or systems shown coupled in any of the drawings may in fact be integrated into a single platform in certain embodiments or may be coupled via any appropriate wired or wireless coupling such as but not limited to optical fiber, Ethernet, Wireless LAN, HomePNA, power line communication, cell phone, PDA, Blackberry GPRS, Satellite including GPS, or other mobile delivery.
Priority is claimed from U.S. provisional application No. 61/193,079, entitled “A Mass-Storage System Utilizing Volatile Memory Storage and Non-Volatile Storage” and filed Oct. 27, 2008.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IL10/00290 | 4/6/2010 | WO | 00 | 5/17/2012 |
Number | Date | Country | |
---|---|---|---|
61165597 | Apr 2009 | US |