Dynamically configuring a storage system to facilitate independent scaling of resources

Description

BACKGROUND
Technical Field

The field of the invention is data processing, or, more specifically, methods, apparatus, and products for dynamically configuring a storage system to facilitate independent scaling of resources.

Background Art

Modern storage systems can include a variety of resources such as computing resources and storage resources. Such resources typically must be scaled together. For example, additional processing resources must be added in order to add more storage resources and additional storage resources must be added in order to add more processing resources. The inability to scale one type of resource without scaling another type of resources inhibits the ability to create a storage system that is tailored to the specific needs of the storage system users.

SUMMARY OF INVENTION

Methods, apparatuses, and products for dynamically configuring a storage system to facilitate independent scaling of resources, including: detecting a change to a topology of the storage system consisting of different sets of blades configured within one of a plurality of chassis; and reconfiguring the storage system to change an allocation of resources to one or more authorities responsive to detecting the change to the topology of the storage system.

The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular descriptions of example embodiments of the invention as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts of example embodiments of the invention.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 sets forth a diagram of a storage system in which resources may be independently scaled according to embodiments of the present disclosure.

FIG. 2 sets forth a diagram of a set of blades useful in dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure.

FIG. 3 sets forth a diagram of a blade useful in dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure.

FIG. 4 sets forth a flowchart illustrating an example method of dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure.

FIG. 5 sets forth a flowchart illustrating an additional example method of dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure.

FIG. 6 sets forth a flowchart illustrating an additional example method of dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure.

FIG. 7 sets forth a flowchart illustrating an additional example method of dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure.

FIG. 8 sets forth a flowchart illustrating an additional example method of dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure.

FIG. 9 sets forth a block diagram of automated computing machinery comprising an example computer useful in dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure.

DESCRIPTION OF EMBODIMENTS

Example methods, apparatus, and products for dynamically configuring a storage system to facilitate independent scaling of resources in accordance with the present disclosure are described with reference to the accompanying drawings, beginning with FIG. 1. FIG. 1 sets forth a diagram of a storage system in which resources may be independently scaled according to embodiments of the present disclosure. The storage system of FIG. 1 includes a plurality of chassis (102, 106, 110, 114) mounted within a rack (100). The rack (100) depicted in FIG. 1 may be embodied as a standardized frame or enclosure for mounting multiple equipment modules, such as each of the chassis (102, 106, 110, 114) depicted in FIG. 1. The rack (100) may be embodied, for example, as a 19-inch rack that includes edges or ears that protrude on each side, thereby enabling a chassis (102, 106, 110, 114) or other module to be fastened to the rack (100) with screws or some other form of fastener.

The chassis (102, 106, 110, 114) depicted in FIG. 1 may be embodied, for example, as passive elements that includes no logic. Each chassis (102, 106, 110, 114) may include a plurality of slots, where each slot is configured to receive a blade. Each chassis (102, 106, 110, 114) may also include a mechanism, such as a power distribution bus, that is utilized to provide power to each blade that is mounted within the chassis (102, 106, 110, 114). Each chassis (102, 106, 110, 114) may further include a communication mechanism, such as a communication bus, that enables communication between each blade that is mounted within the chassis (102, 106, 110, 114). The communication mechanism may be embodied, for example, as an Ethernet bus, Peripheral Component Interconnect Express (‘PCIe’) bus, InfiniBand bus, and so on. In some embodiments, each chassis (102, 106, 110, 114) may include at least two instances of both the power distribution mechanism and the communication mechanism, where each instance of the power distribution mechanism and each instance of the communication mechanism may be enabled or disabled independently.

Each chassis (102, 106, 110, 114) depicted in FIG. 1 may also include one or more ports for receiving an external communication bus that enables communication between multiple chassis (102, 106, 110, 114), directly or through a switch, as well as communications between a chassis (102, 106, 110, 114) and an external client system. The external communication bus may use a technology such as Ethernet, InfiniBand, Fibre Channel, and so on. In some embodiments, the external communication bus may use different communication bus technologies for inter-chassis communication than is used for communication with an external client system. In embodiments where one or more switches are deployed, each switch may act as a translation between multiple protocols or technologies. When multiple chassis (102, 106, 110, 114) are connected to define a storage cluster, the storage cluster may be accessed by a client using either proprietary interfaces or standard interfaces such as network file system (‘NFS’), common internet file system (‘CIFS’), small computer system interface (‘SCSI’), hypertext transfer protocol (‘HTTP’), and so on. Translation from the client protocol may occur at the switch, external communication bus, or within each blade.

Each chassis (102, 106, 110, 114) depicted in FIG. 1 houses fifteen blades (104, 108, 112, 116), although in other embodiments each chassis (102, 106, 110, 114) may house more or fewer blades. Each of the blades (104, 108, 112, 116) depicted in FIG. 1 may be embodied, for example, as a computing device that includes one or more computer processors, dynamic random access memory (‘DRAM’), flash memory, interfaces for one more communication busses, interfaces for one or more power distribution busses, cooling components, and so on. Although the blades (104, 108, 112, 116) will be described in more detail below, readers will appreciate that the blades (104, 108, 112, 116) depicted in FIG. 1 may be embodied as different types of blades, such that the collective set of blades (104, 108, 112, 116) include heterogeneous members. Blades may be of different types as some blades (104, 108, 112, 116) may only provide processing resources to the overall storage system, some blades (104, 108, 112, 116) may only provide storage resources to the overall storage system, and some blades (104, 108, 112, 116) may provide both processing resources and storage resources to the overall storage system. Furthermore, even the blades (104, 108, 112, 116) that are identical in type may be different in terms of the amount of storage resources that the blades (104, 108, 112, 116) provide to the overall storage system. For example, a first blade that only provides storage resources to the overall storage system may provide 8 TB of storage while a second blade that only provides storage resources to the overall storage system may provide 16 TB of storage. The blades (104, 108, 112, 116) that are identical in type may also be different in terms of the amount of processing resources that the blades (104, 108, 112, 116) provide to the overall storage system. For example, a first blade that only provides processing resources to the overall storage system may include more processors or more powerful processors than a second blade that only provides processing resources to the overall storage system. Readers will appreciate that other differences may also exist between two individual blades and that blade uniformity is not required according to embodiments described herein.

Although not explicitly depicted in FIG. 1, each chassis (102, 106, 110, 114) may include one or more modules, data communications bus, or other apparatus that is used to identify which type of blade is inserted into a particular slot of the chassis (102, 106, 110, 114). In such an example, a management module may be configured to request information from each blade in each chassis (102, 106, 110, 114) when each blade is powered on, when the blade is inserted into a chassis (102, 106, 110, 114), or at some other time. The information received by the management module can include, for example, a special purpose identifier maintained by the blade that identifies the type (e.g., storage blade, compute blade, hybrid blade) of blade that has been inserted into the chassis (102, 106, 110, 114). In an alternative embodiment, each blade (102, 106, 110, 114) may be configured to automatically provide such information to a management module as part of a registration process.

In the example depicted in FIG. 1, the storage system may be initially configured by a management module that is executing remotely. The management module may be executing, for example, in a network switch control processor. Readers will appreciate that such a management module may be executing on any remote CPU and may be coupled to the storage system via one or more data communication networks. Alternatively, the management module may be executing locally as the management module may be executing on one or more of the blades (104, 108, 112, 116) in the storage system.

In the example depicted in FIG. 1, one or more of the blades (104, 108, 112, 116) may be used for dynamically configuring the storage system to facilitate independent scaling of resources by: detecting a change to a topology of the storage system; reconfiguring the storage system to change an allocation of resources to one or more authorities responsive to detecting the change to the topology of the storage system; executing an authority on a first set of blades and reconfiguring the storage system by executing the authority on a second set of blades; associating storage on a first set of blades with a write group and reconfiguring the storage system by associating, in dependence upon a write group formation policy, storage on a second set of blades with the write group; detecting that an amount of processing resources within the storage system has changed and reconfiguring the storage system by increasing or decreasing an amount of processing resources allocated to one or more authorities; detecting that an amount of storage resources within the storage system has changed and reconfiguring the storage system by increasing or decreasing an amount of storage associated with one or more write groups; and detecting that a utilization of a particular resource has reached a utilization threshold, as will be described in greater detail below. Readers will appreciate that while in some embodiments one or more of the blades (104, 108, 112, 116) may be used for dynamically configuring the storage system to facilitate independent scaling of resources by carrying out the steps listed above, in alternative embodiments, another apparatus that includes at least computer memory and a computer processor may be used for dynamically configuring the storage system to facilitate independent scaling of resources by carrying out the steps listed above.

For further explanation, FIG. 2 sets forth a diagram of a set of blades (202, 204, 206, 208) useful in dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure. Although blades will be described in greater detail below, the blades (202, 204, 206, 208) depicted in FIG. 2 may include compute resources (210, 212, 214), storage resources in the form of flash memory (230, 232, 234), storage resources in the form of non-volatile random access memory (‘NVRAM’) (236, 238, 240), or any combination thereof. In the example depicted in FIG. 2, the blades (202, 204, 206, 208) are of differing types. For example, one blade (206) includes only compute resources (214), another blade (208) includes only storage resources, depicted here as flash (234) memory and NVRAM (240), and two of the blades (202, 204) include compute resources (210, 212) as well as storage resources in the form of flash (230, 232) memory and NVRAM (236, 238). In such of an example, the blade (206) that includes only compute resources (214) may be referred to as a compute blade, the blade (208) that includes only storage resources may be referred to as a storage blade, and the blades (202, 204) that include both compute resources (210, 212) and storage resources may be referred to as a hybrid blade.

The compute resources (210, 212, 214) depicted in FIG. 2 may be embodied, for example, as one or more computer processors, as well as memory that is utilized by the computer processor but not included as part of general storage within the storage system. The compute resources (210, 212, 214) may be coupled for data communication with other blades and with external client systems, for example, via one or more data communication busses that are coupled to the compute resources (210, 212, 214) via one or more data communication adapters.

The flash memory (230, 232, 234) depicted in FIG. 2 may be embodied, for example, as multiple flash dies which may be referred to as packages of flash dies or an array of flash dies. Such flash dies may be packaged in any number of ways, with a single die per package, multiple dies per package, in hybrid packages, as bare dies on a printed circuit board or other substrate, as encapsulated dies, and so on. Although not illustrated in FIG. 2, an input output (I/O) port may be coupled to the flash dies and a direct memory access (‘DMA’) unit may also be coupled directly or indirectly to the flash dies. Such components may be implemented, for example, on a programmable logic device (‘PLD’) such as a field programmable gate array (‘FPGA’). The flash memory (230, 232, 234) depicted in FIG. 2 may be organized as pages of a predetermined size, blocks that include a predetermined number of pages, and so on.

The NVRAM (236, 238, 240) depicted in FIG. 2 may be embodied, for example, as one or more non-volatile dual in-line memory modules (‘NVDIMMs’), as one more DRAM dual in-line memory modules (‘DIMMs’) that receive primary power through a DIMM slot but are also attached to a backup power source such as a supercapacitor, and so on. The NVRAM (236, 238, 240) depicted in FIG. 2 may be utilized as a memory buffer for temporarily storing data that will be written to flash memory (230, 232, 234), as writing data to the NVRAM (236, 238, 240) may be carried out more quickly than writing data to flash memory (230, 232, 234). In this way, the latency of write requests may be significantly improved relative to a system in which data is written directly to the flash memory (230, 232, 234).

In the example method depicted in FIG. 2, a first blade (202) includes a first authority (216) that is executing on the compute resources (210) within the first blade (202) and a second blade (206) includes a second authority (218) that is executing on the compute resources (214) within the second blade (206). Each authority (216, 218) represents a logical partition of control and may be embodied as a module of software executing on the compute resources (210, 212, 214) of a particular blade (202, 204, 206). Each authority (216, 218) may be configured to control how and where data is stored in storage system. For example, authorities (216, 218) may assist in determining which type of erasure coding scheme is applied to the data, authorities (216, 218) may assist in determining where one or more portions of the data may be stored in the storage system, and so on. Each authority (216, 218) may control a range of inode numbers, segment numbers, or other data identifiers which are assigned to data by a file system or some other entity.

Readers will appreciate that every piece of data and every piece of metadata stored in the storage system is owned by a particular authority (216, 218). Each authority (216, 218) may cause data that is owned by the authority (216, 218) to be stored within storage that is located within the same blade whose computing resources are supporting the authority (216, 218) or within storage that is located on some other blade. For example, the authority (216) that is executing on the compute resources (210) within a first blade (202) has caused data to be stored within a portion (220) of flash (230) and a portion (242) of NVRAM (236) that is physically located within the first blade (202), The authority (216) that is executing on the compute resources (210) within the first blade (202) has also caused data to be stored within a portion (222) of flash (232) on the second blade (204) in the storage system as well as a portion (226) of flash (234) on the fourth blade (208) in the storage system. Likewise, the authority (218) that is executing on the compute resources (214) within the third blade (202) has caused data to be stored within a portion (244) of NVRANI (236) that is physically located within the first blade (202), within a portion (224) of flash (232) within the second blade (204), within a portion (228) of flash (234) within the fourth blade (208), and within a portion (246) of NVRANI (240) within the fourth blade (208).

Readers will appreciate that many embodiments other than the embodiment depicted in FIG. 2 are contemplated as it relates to the relationship between data, authorities, and system components. In some embodiments, every piece of data and every piece of metadata has redundancy in the storage system. In some embodiments, the owner of a particular piece of data or a particular piece of metadata may be a ward, with an authority being a group or set of wards. Likewise, in some embodiments there are redundant copies of authorities. In some embodiments, authorities have a relationship to blades and the storage resources contained therein. For example, each authority may cover a range of data segment numbers or other identifiers of the data and each authority may be assigned to a specific storage resource. Data may be stored in a segment according to some embodiments of the present disclosure, and such segments may be associated with a segment number which serves as indirection for a configuration of a RAID stripe. A segment may identify a set of storage resources and a local identifier into the set of storage resources that may contain data. In some embodiments, the local identifier may be an offset into a storage device and may be reused sequentially by multiple segments. In other embodiments the local identifier may be unique for a specific segment and never reused. The offsets in the storage device may be applied to locating data for writing to or reading from the storage device.

Readers will appreciate that if there is a change in where a particular segment of data is located (e.g., during a data move or a data reconstruction), the authority for that data segment should be consulted. In order to locate a particular piece of data, a hash value for a data segment may be calculated, an inode number may be applied, a data segment number may be applied, and so on. The output of such an operation can point to a storage resource for the particular piece of data. In some embodiments the operation described above may be carried out in two stages. The first stage maps an entity identifier (ID) such as a segment number, an inode number, or directory number to an authority identifier. This mapping may include a calculation such as a hash or a bit mask. The second stage maps the authority identifier to a particular storage resource, which may be done through an explicit mapping. The operation may be repeatable, so that when the calculation is performed, the result of the calculation reliably points to a particular storage resource. The operation may take the set of reachable storage resources as input, and if the set of reachable storage resources changes, the optimal set changes. In some embodiments, a persisted value represents the current assignment and the calculated value represents the target assignment the cluster will attempt to reconfigure towards.

The compute resources (210, 212, 214) within the blades (202, 204, 206) may be tasked with breaking up data to be written to storage resources in the storage system. When data is to be written to a storage resource, the authority for that data is located as described above. When the segment ID for data is already determined, the request to write the data is forwarded to the blade that is hosting the authority, as determined using the segment ID. The computing resources on such a blade may be utilized to break up the data and transmit the data for writing to a storage resource, at which point the transmitted data may be written as a data stripe in accordance with an erasure coding scheme. In some embodiments, data is requested to be pulled and in other embodiments data is pushed. When compute resources (210, 212, 214) within the blades (202, 204, 206) are tasked with reassembling data read from storage resources in the storage system, the authority for the segment ID containing the data is located as described above.

The compute resources (210, 212, 214) within the blades (202, 204, 206) may also be tasked with reassembling data read from storage resources in the storage system. The compute resources (210, 212, 214) that support the authority that owns the data may request the data from the appropriate storage resource. In some embodiments, the data may be read from flash storage as a data stripe. The compute resources (210, 212, 214) that support the authority that owns the data may be utilized to reassemble the read data, including correcting any errors according to the appropriate erasure coding scheme, and forward the reassembled data to the network. In other embodiments, breaking up and reassembling data, or some portion thereof, may be performed by the storage resources themselves.

The preceding paragraphs discuss the concept of a segment. A segment may represent a logical container of data in accordance with some embodiments. A segment may be embodied, for example, as an address space between medium address space and physical flash locations. Segments may also contain metadata that enables data redundancy to be restored (rewritten to different flash locations or devices) without the involvement of higher level software. In some embodiments, an internal format of a segment contains client data and medium mappings to determine the position of that data. Each data segment may be protected from memory and other failures, for example, by breaking the segment into a number of data and parity shards. The data and parity shards may be distributed by striping the shards across storage resources in accordance with an erasure coding scheme.

For further explanation, FIG. 3 sets forth a diagram of a blade (302) useful in dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure. As described above, the storage system may include storage blades, compute blades, hybrid blades, or any combination thereof. The example depicted in FIG. 3 represents an embodiment of a hybrid blade as the blade (302) includes both compute resources and storage resources.

The compute resources in the blade (302) depicted in FIG. 3 includes a host server (304) that includes a computer processor (306) coupled to memory (310) via a memory bus (308). The computer processor (306) depicted in FIG. 3 may be embodied, for example, as a central processing unit (‘CPU’) or other form of electronic circuitry configured to execute computer program instructions. The computer processor (306) may utilize the memory (310) to store data or other information useful during the execution of computer program instructions by the computer processor (306). Such memory (310) may be embodied, for example, as DRAM that is utilized by the computer processor (306) to store information when the computer processor (306) is performing computational tasks such as creating and sending I/O operations to one of the storage units (312, 314), breaking up data, reassembling data, and other tasks.

In the example depicted in FIG. 3, the computer processor (306) is coupled to two data communication links (332, 334). Such data communications links (332, 334) may be embodied, for example, as Ethernet links that are coupled to a data communication network via a network adapter. The computer processor (306) may receive input/output operations that are directed to the attached storage units (312, 314), such as requests to read data from the attached storage units (312, 314) or requests to write data to the attached storage units (312, 314).

The blade (302) depicted in FIG. 3 also includes storage resources in the form of one or more storage units (312, 314). Each storage unit (312, 314) may include flash (328, 330) memory as well as other forms of memory (324, 326), such as the NVRAM discussed above. In the example depicted in FIG. 3, the storage units (312, 314) may include integrated circuits such as a field-programmable gate array (‘FPGA’) (320, 322), microprocessors such as an Advanced RISC Machine (‘ARM’) microprocessor that are utilized to write data to and read data from the flash (328, 330) memory as well as the other forms of memory (324, 326) in the storage unit (312, 314), or any other form of computer processor. The FPGAs (320, 322) and the ARM (316, 318) microprocessors may, in some embodiments, perform operations other than strict memory accesses. For example, in some embodiments the FPGAs (320, 322) and the ARM (316, 318) microprocessors may break up data, reassemble data, and so on. In the example depicted in FIG. 3, the computer processor (306) may access the storage units (312, 314) via a data communication bus (336) such as a PCIe bus.

Readers will appreciate that a compute blade may be similar to the blade (302) depicted in FIG. 3 as the compute blade may include one or more host servers that are similar to the host server (304) depicted in FIG. 3. Such a compute blade may be different than the blade (302) depicted in FIG. 3, however, as the compute blade may lack the storage units (312, 314) depicted in FIG. 3. Readers will further appreciate that a storage blade may be similar to the blade (302) depicted in FIG. 3 as the storage blade may include one or more storage units that are similar to the storage units (312, 314) depicted in FIG. 3. Such a storage blade may be different than the blade (302) depicted in FIG. 3, however, as the storage blade may lack the host server (304) depicted in FIG. 3. The example blade (302) depicted in FIG. 3 is included only for explanatory purposes. In other embodiments, the blades may include additional processors, additional storage units, compute resources that are packaged in a different manner, storage resources that are packaged in a different manner, and so on.

For further explanation, FIG. 4 sets forth a flowchart illustrating an example method of dynamically configuring a storage system (400) to facilitate independent scaling of resources according to embodiments of the present disclosure. Resources in a storage system may be independently scaled, for example, by altering the amount of storage resources available to an entity in the storage system without altering the amount of processing resources available to the entity, by an altering the amount of processing resources available to an entity in the storage system without altering the amount of storage resources available to the entity, and so on. Examples of resources that may be independently scaled can include processing resources, storage resources, networking resources, and others.

The example method depicted in FIG. 4 includes detecting (402) a change to a topology of the storage system (400). The topology of the storage system (400) can be characterized by various aspects of the physical configuration of the storage system (400) such as, for example, the number of chassis in the storage system (400), the number of blades in each chassis, the storage capacity of one or more blades, the processing capacity one or more blades, and so on. Detecting (402) a change to the topology of the storage system (400) may be carried out, for example, by detecting that a new chassis has been added to the storage system (400), by detecting that a new blade has been added to the storage system (400), by detecting that a blade has failed or otherwise been removed from the storage system (400), by detecting that a blade has been moved from a first chassis to a second chassis, and so on. In such an example, detecting that a component has been added to the storage system (400) may be accomplished through the use of sensors that detect the insertion of a component, through the use of a device registration process that is carried out when a new component is inserted into the storage system (400), and in other ways. Detecting that a component has been removed from the storage system (400) may be accomplished through the use of sensors that detect the removal of a component, through the use of a communication process determining that a component is unreachable, and in other ways.

The topology of the storage system (400) may also be characterized by various aspects of the logical configuration of the storage system (400) such as, for example, a configuration setting that defines a RAID level that is utilized for striping data across storage devices in the storage system, a configuration setting that defines which redundancy policy that data contained within a particular write group should adhere to, a configuration setting that identifies the number of snapshots to be retained in the system, or any other configuration that impacts how the storage system (400) will operate. Detecting (402) a change to the topology of the storage system (400) may therefore be carried out, for example, by detecting that a particular configuration setting has changed.

In the example method depicted in FIG. 4, the storage system (400) consists of different sets of blades (408, 410) configured within one of a plurality of chassis (406, 412). The sets of blades (408, 410) may be different as the sets may include a different number of blades, blades of differing types, blades with non-uniform storage capacities, blades with non-uniform processing capacities, and so on. In addition to the sets of blades (408, 410) being different, two blades within the same set may also be different as the two blades may have non-uniform amounts and types of storage resources within each blade, the two blades may have non-uniform amounts and types of processing resources within each blade, and so on.

The example method depicted in FIG. 4 includes reconfiguring (404) the storage system (400) to change an allocation of resources to one or more authorities responsive to detecting the change to the topology of the storage system (400). Reconfiguring (404) the storage system (400) to change an allocation of resources to one or more authorities may be carried out, for example, by changing the amount of processing resources that are dedicated to executing a particular authority, by changing the amount of storage resources that are available for use by a particular authority, and so on.

For further explanation, FIG. 5 sets forth a flowchart illustrating an additional example method of dynamically configuring a storage system (400) to facilitate independent scaling of resources according to embodiments of the present disclosure. The example method depicted in FIG. 5 is similar to the example method depicted in FIG. 4, as the example method depicted in FIG. 5 also includes detecting (402) a change to a topology of the storage system (400) and reconfiguring (404) the storage system (400) to change an allocation of resources to one or more authorities responsive to detecting the change to the topology of the storage system (400).

The example method depicted in FIG. 5 also includes executing (502) an authority on a first set of blades (408). An authority may initially be executed (502) on computing resources within a single blade or initially executed (502) on computing resources within multiple blades. As such, the first set of blades (408) that the authority is executed (502) on may be composed of one or more blades. In the embodiment depicted in FIG. 4, the first set of blades (408) is depicted as residing within a single chassis (406). Readers will appreciate, however, that in other embodiments the authority may be executed (502) on a set of blades that are distributed across multiple chassis (406, 412). In embodiments where the authority is executed (502) on computing resources within multiple blades, the blades may be configured to exchange information between each instance of the authority such that each instance of the authority is aware of actions taken by other instances of the authority. Such information may be exchanged over data communication busses that are internal to a particular chassis (406) when the blades reside within the same chassis (406), and such information may be exchanged over inter-chassis data communication busses when the blades reside within the different chassis (406, 412).

In the example method depicted in FIG. 5, reconfiguring (404) the storage system (400) can include executing (504) the authority on a second set of blades (410). Although the second set of blades (410) depicted in FIG. 5 are depicted as including none of the blades that were in the first set of blades (408), readers will appreciate that in other embodiments each set may include one or more overlapping blades. For example, the second set of blades (410) may include all of the blades in the first set of blades (408) as well as one or more additional blades, the second set of blades (410) may include a subset of the blades in the first set of blades (408), the second set of blades (410) may include a subset of the blades in the first set of blades (408) as well as one or more additional blades, and so on. Reconfiguring (404) the storage system (400) to change an allocation of resources to one or more authorities may be carried out, for example, by ceasing execution of an authority on a particular blade, by beginning execution of the authority on a particular blade, and so on.

For further explanation, FIG. 6 sets forth a flowchart illustrating an additional example method of dynamically configuring a storage system (400) to facilitate independent scaling of resources according to embodiments of the present disclosure. The example method depicted in FIG. 6 is similar to the example method depicted in FIG. 4, as the example method depicted in FIG. 6 also includes detecting (402) a change to a topology of the storage system (400) and reconfiguring (404) the storage system (400) to change an allocation of resources to one or more authorities responsive to detecting the change to the topology of the storage system (400).

The example method depicted in FIG. 6 also includes associating (602) storage on a first set of blades (408) with a write group. A write group represents a logical grouping of data that is stored in the storage system (400). The logical grouping may represent, for example, data associated with a particular user, data associated with a particular application, and so on. The data that is included in a particular write group may be striped across many blades and many storage resources contained therein. The data that is included in a particular write group may also be subject to a redundancy policy. Such a redundancy policy may be used to determine the amount of redundancy data that must be stored within the storage system (400), such that a predetermined number of blades that contain some portion of the data that is included in a particular write group may fail without resulting in the loss of any of the data that is included in the particular write group. The predetermined number of blades that contain some portion of the data that is included in a particular write group may fail without resulting in the loss of any of the data that is included in the particular write group because the data contained on the failed blade may be rebuilt using redundancy data, data stored on blades that have not failed, or some combination thereof. Consider, as a very simple example of the inclusion of redundancy data, an example in which user data was striped across two blades and redundancy data was stored on an a third blade, where the redundancy data was generated by performing an XOR using the user data as input. In such an example, if one of the blades that contained user data failed, the data stored on the failed blade may be reconstructed using the user data stored on the functioning blade and the redundancy data.

In the example method depicted in FIG. 6, reconfiguring (404) the storage system (400) can include associating (604), in dependence upon a write group formation policy, storage on a second set of blades (410) with the write group. The second set of blades (410) may include all blades within the first set of blades (408) as well as additional blades, a subset of the blades within the first set of blades (408), a subset of the blades within the first set of blades (408) as well as additional blades, no blades that were included within the first set of blades (408), and so on.

In the example method depicted in FIG. 6, storage on a second set of blades (410) is associated (604) with the write group in dependence upon a write group formation policy. The write group formation policy may be embodied as one or more rules that define desired attributes of the write group. The write group formation policy may specify, for example, that only write groups that adhere to a certain redundancy policy should be formed, that only write groups whose RAID overhead is equal to or below a certain threshold should be formed, that only write groups expected to generate an amount of network traffic that is equal to or below a certain threshold should be formed, and so on. In the example method depicted in FIG. 6, many candidate blade sets may be identified and each candidate blade set may be evaluated to determine whether the candidate blade set is in compliance with the write group formation policy. In such an example, the write group may only be associated (604) with a second set of blades (410) that is in compliance with the write group formation policy. If no set of blades are fully compliant with the write group formation policy, the write group may only be associated (604) with a second set of blades (410) that are most compliant with the write group formation policy.

Consider an example in which a storage system (400) included ten chassis that each included fifteen blades. In such an example, assume that the write group formation policy specifies that write groups should be formed such that: 1) the failure of an entire chassis will not cause data loss, and 2) the failure of any two blades within a single chassis will not cause data loss. In such an example, any write group whose data would be stored on three or more blades within a single chassis would not adhere to the write group formation policy as the failure of a single chassis (which should not result in data loss) could result in the failure of three of more blades (which can result in data loss). As such, the write group may only be associated (604) with a second set of blades (410) that include no more than two blades in a single chassis in order for the write group to conform to the write group formation policy.

For further explanation, FIG. 7 sets forth a flowchart illustrating an additional example method of dynamically configuring a storage system (400) to facilitate independent scaling of resources according to embodiments of the present disclosure. The example method depicted in FIG. 7 is similar to the example method depicted in FIG. 4, as the example method depicted in FIG. 7 also includes detecting (402) a change to a topology of the storage system (400) and reconfiguring (404) the storage system (400) to change an allocation of resources to one or more authorities responsive to detecting the change to the topology of the storage system (400).

In the example method depicted in FIG. 7, detecting (402) a change to a topology of the storage system (400) can include detecting (702) that an amount of processing resources within the storage system (400) has changed. Detecting (702) that an amount of processing resources within the storage system (400) has changed may be carried out, for example, by detecting that a blade that contains processing resources has been inserted into one of the chassis (406, 412) within the storage system (400), by detecting that a blade that contains processing resources has been removed from one of the chassis (406, 412) within the storage system (400), by detecting that a blade that contains processing resources has failed, by detecting that a blade that contains processing resources has recovered from a failure, and so on. Such events may be detected, for example, through the use of sensors, device discovery processes, device registration processes, and many other mechanisms. When an event such as a blade insertion occurs, an inventory tracking process may be configured to request information from the inserted blade that identifies the resources contained in the blade in order to determine whether the inserted blade includes processing resources. Such information may be maintained and later utilized if the blade is removed from the storage system, the blade fails, or the blade recovers from failure to determine whether the removed, failed, or recovered blade includes processing resources.

In the example method depicted in FIG. 7, detecting (402) a change to a topology of the storage system (400) can alternatively include detecting (704) that an amount of storage resources within the storage system (400) has changed. Detecting (704) that an amount of storage resources within the storage system (400) has changed may be carried out, for example, by detecting that a blade that contains storage resources has been inserted into one of the chassis (406, 412), by detecting that a blade that contains storage resources has been removed from one of the chassis (406, 412), by detecting that a blade that contains storage resources has failed, by detecting that a blade that contains storage resources has recovered from a failure, and so on. Such events may be detected, for example, through the use of sensors, device discovery processes, device registration processes, and many other mechanisms. When an event such as a blade insertion occurs, an inventory tracking process may be configured to request information from the inserted blade that identifies the resources contained in the blade in order to determine whether the inserted blade includes storage resources. Such information may be maintained and later utilized if the blade is removed from the storage system, the blade fails, or the blade recovers from failure to determine whether the removed, failed, or recovered blade includes storage resources.

In the example method depicted in FIG. 7, reconfiguring (404) the storage system (400) can include increasing (706) or decreasing an amount of processing resources allocated to one or more authorities. The amount of processing resources allocated to one or more authorities may be increased (706) or decreased, for example, by altering the set of blades that are utilized to execute the authority. For example, the authority may be executed on compute resources contained within an additional blade that was not being used to execute the authority prior to reconfiguring (404) the storage system (400), the authority may cease to be executed on compute resources contained within a blade that was being used to execute the authority prior to reconfiguring (404) the storage system (400), and so on. The amount of processing resources allocated to one or more authorities may also be increased (706) or decreased by adjusting the portion of the compute resources within a particular blade that are allocated for use in executing the authority. For example, if an authority was already executing on the first blade prior to reconfiguring (404) the storage system (400), the portion of the compute resources within the first blade that are allocated for use in executing the authority may be adjusted. Such an adjustment may be made expressly by changing a system setting. Such an adjustment may alternatively be made by assigning additional authorities to the compute resources within the first blade, or migrating other authorities away from the compute resources within the first blade, such that authority must share the compute resources with fewer or additional consumers of the compute resources.

In the example method depicted in FIG. 7, reconfiguring (404) the storage system (400) can alternatively include increasing (708) or decreasing an amount of storage associated with one or more write groups. The amount of storage associated with one or more write groups may be increased (708) or decreased, for example, by altering the set of blades that are utilized to provide storage for use by the one or more write groups. For example, storage resources on an additional blade that were not being used by the write group prior to reconfiguring (404) the storage system (400) may be allocated for use by the write group, storage resources on a blade that were being used by the write group prior to reconfiguring (404) the storage system (400) may be no longer allocated for use by the write group, and so on. The amount of storage associated with one or more write groups may also be increased (708) or decreased by adjusting the amount of storage resources within a particular blade that are associated with one or more write groups. For example, if storage resources on a first blade were associated with the one or more write groups prior to reconfiguring (404) the storage system (400), the amount of storage resources on the first blade that is associated with the one or more write groups may be changed.

For further explanation, FIG. 8 sets forth a flowchart illustrating an additional example method of dynamically configuring a storage system (400) to facilitate independent scaling of resources according to embodiments of the present disclosure. The example method depicted in FIG. 8 is similar to the example method depicted in FIG. 4, as the example method depicted in FIG. 8 also includes detecting (402) a change to a topology of the storage system (400) and reconfiguring (404) the storage system (400) to change an allocation of resources to one or more authorities responsive to detecting the change to the topology of the storage system (400).

In the example method depicted in FIG. 8, detecting (402) a change to a topology of the storage system (400) can include detecting (802) that a utilization of a particular resource has reached a utilization threshold. In the example method depicted in FIG. 8, the utilization threshold may be embodied as a value used to determine whether a particular resource is being underutilized. For example, a utilization threshold may be set such that when a computer processor within the storage system (400) is utilizing only 25% or less of all available processor cycles to execute instructions, the computer processor is deemed to be underutilized. In such an example, the storage system (400) may be reconfigured (404) to change an allocation of resources to one or more authorities by making the underutilized computer processor available for use by an additional authority, thereby enabling another authority to utilize the underutilized resource. Readers will appreciate that a similar situation may play out when the amount of storage allocated for use by a particular write group is underutilized and that such underutilized storage may be reallocated as part of a reconfiguration.

In the example method depicted in FIG. 8, the utilization threshold may also be embodied as a value used to determine whether a particular resource is being over utilized. For example, a utilization threshold may be set such that when a computer processor within the storage system (400) is utilizing 85% or more of all available processor cycles to execute instructions, the computer processor is deemed to be over utilized. In such an example, the storage system (400) may be reconfigured (404) to change an allocation of resources to one or more authorities by moving authorities that are executing on the over utilized computer processor to another computer processor, by making other computer processors available for use by an authority that is executing on the over utilized computer processor, and so on. Readers will appreciate that a similar situation may play out when the amount of storage allocated for use by a particular write group is over utilized and that additional storage within the storage system may be allocated for use by the write group as part of a reconfiguration.

In the examples described above, an authority executing on processing resources within a first blade may cause data to be stored on storage resources within a second blade. In such an example, the authority executing on a first blade may cause data to be stored on a second blade by sending a request to store data on the second blade to processing resources on the second blade. Alternatively, and particularly when there are no available processing resources on the second blade (e.g., the second blade is a storage blade), the processing resources on the first blade may have direct access to the storage resources on the second blade via a data communication link, via remote DMA hardware, or in other ways.

For further explanation, FIG. 9 sets forth a block diagram of automated computing machinery comprising an example computer (952) useful in dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure. The computer (952) of FIG. 9 includes at least one computer processor (956) or “CPU” as well as random access memory (“RAM”) (968) which is connected through a high speed memory bus (966) and bus adapter (958) to processor (956) and to other components of the computer (952). Stored in RAM (968) is a dynamic configuration module (926), a module of computer program instructions for dynamically configuring a storage system to facilitate independent scaling of resources according to embodiments of the present disclosure. The dynamic configuration module (926) may be configured for dynamically configuring the storage system to facilitate independent scaling of resources by: detecting a change to a topology of the storage system; reconfiguring the storage system to change an allocation of resources to one or more authorities responsive to detecting the change to the topology of the storage system; executing an authority on a first set of blades and reconfiguring the storage system by executing the authority on a second set of blades; associating storage on a first set of blades with a write group and reconfiguring the storage system by associating, in dependence upon a write group formation policy, storage on a second set of blades with the write group; detecting that an amount of processing resources within the storage system has changed and reconfiguring the storage system by increasing or decreasing an amount of processing resources allocated to one or more authorities; detecting that an amount of storage resources within the storage system has changed and reconfiguring the storage system by increasing or decreasing an amount of storage associated with one or more write groups; and detecting that a utilization of a particular resource has reached a utilization threshold, as was described in greater detail above.

Also stored in RAM (968) is an operating system (954). Operating systems useful in computers configured for dynamically configuring the storage system to facilitate independent scaling of resources according to embodiments described herein include UNIX™, Linux™, Microsoft XP™, AIX™, IBM's i5/OS™, and others as will occur to those of skill in the art. The operating system (954) and dynamic configuration module (926) in the example of FIG. 9 are shown in RAM (968), but many components of such software typically are stored in non-volatile memory also, such as, for example, on a disk drive (970).

The example computer (952) of FIG. 9 also includes disk drive adapter (972) coupled through expansion bus (960) and bus adapter (958) to processor (956) and other components of the computer (952). Disk drive adapter (972) connects non-volatile data storage to the computer (952) in the form of disk drive (970). Disk drive adapters useful in computers configured for dynamically configuring the storage system to facilitate independent scaling of resources according to embodiments described herein include Integrated Drive Electronics (“IDE”) adapters, Small Computer System Interface (“SCSI”) adapters, and others as will occur to those of skill in the art. Non-volatile computer memory also may be implemented for as an optical disk drive, electrically erasable programmable read-only memory (so-called “EEPROM” or “Flash” memory), RAM drives, and so on, as will occur to those of skill in the art.

The example computer (952) of FIG. 9 includes one or more input/output (“I/O”) adapters (978). I/O adapters implement user-oriented input/output through, for example, software drivers and computer hardware for controlling output to display devices such as computer display screens, as well as user input from user input devices (982) such as keyboards and mice. The example computer (952) of FIG. 9 includes a video adapter (909), which is an example of an I/O adapter specially designed for graphic output to a display device (980) such as a display screen or computer monitor. Video adapter (909) is connected to processor (956) through a high speed video bus (964), bus adapter (958), and the front side bus (962), which is also a high speed bus.

The example computer (952) of FIG. 9 includes a communications adapter (967) for data communications with a storage system (984) as described above and for data communications with a data communications network (900). Such data communications may be carried out serially through RS-232 connections, through external buses such as a Universal Serial Bus (‘USB’), a Fibre Channel data communications link, an Infiniband data communications link, through data communications networks such as IP data communications networks, and in other ways as will occur to those of skill in the art. Communications adapters implement the hardware level of data communications through which one computer sends data communications to another computer, directly or through a data communications network. Examples of communications adapters useful in computers configured for dynamically configuring the storage system to facilitate independent scaling of resources according to embodiments described herein include Ethernet (IEEE 802.3) adapters for wired data communications, Fibre Channel adapters, Infiniband adapters, and so on.

The computer (952) may implement certain instructions stored on RAM (968) for execution by processor (956) for dynamically configuring the storage system to facilitate independent scaling of resources. In some embodiments, dynamically configuring the storage system to facilitate independent scaling of resources may be implemented as part of a larger set of executable instructions. For example, the dynamic configuration module (926) may be part of an overall system management process.

Readers will appreciate although the flow charts depicted in FIGS. 4-8 are illustrated as occurring in a particular order, such an ordering is only required when expressly recited in the claims that are included below. The steps depicted in the Figures may occur in different orders, the steps may occur iteratively, or any combination thereof. Readers will further appreciate that the diagrams depicted in FIGS. 1-3 and FIG. 9 are only included for illustrative purposes, and that systems and apparatuses may be embodied utilizing additional hardware and software components.

Example embodiments of the present disclosure are described largely in the context of a methods and apparatus that are useful in dynamically configuring a storage system (400) to facilitate independent scaling of resources. Readers of skill in the art will recognize, however, that the present disclosure also may be embodied in a computer program product disposed upon computer readable storage media for use with any suitable data processing system. Such computer readable storage media may be any storage medium for machine-readable information, including magnetic media, optical media, or other suitable media. Examples of such media include magnetic disks in hard drives or diskettes, compact disks for optical drives, magnetic tape, and others as will occur to those of skill in the art. Persons skilled in the art will immediately recognize that any computer system having suitable processing means will be capable of executing the steps of the method of the invention as embodied in a computer program product. Persons skilled in the art will recognize also that, although some of the example embodiments described in this specification are oriented to software installed and executing on computer hardware, nevertheless, alternative embodiments implemented as firmware or as hardware are well within the scope of the present disclosure.

The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, RAM, a read-only memory (‘ROM’), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (‘SRAM’), a portable compact disc read-only memory (‘CD-ROM’), a digital versatile disk (‘DVD’), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

Computer readable program instructions for carrying out operations of the present disclosure may be assembler instructions, instruction-set-architecture (‘ISA’) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.

Aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

It will be understood from the foregoing description that modifications and changes may be made in various embodiments of the present disclosure without departing from its true spirit. The descriptions in this specification are for purposes of illustration only and are not to be construed in a limiting sense. The scope of the present disclosure is limited only by the language of the following claims.

Claims

1. A method of dynamically configuring a storage system to facilitate independent scaling of resources, the method comprising: detecting a change to a topology of the storage system consisting of different sets of blades configured within one of a plurality of chassis, at least one blade of the different sets of blades having differing storage capacities, the change to the topology including a change in the number of blades in the storage system; andresponsive to detecting the change to the topology of the storage system, reconfiguring the storage system to change an allocation of resources to one or more authorities without replicating any of the one or more authorities, each of the one or more authorities owning a segment of data, wherein the one or more authorities are configured to be identified through a hashing operation performed on the segment of data.
2. The method of claim 1, further comprising: executing an authority on a first set of blades, the authority having control for a range of inode numbers; andwherein reconfiguring the storage system further comprises executing the authority on a second set of blades.
3. The method of claim 1, further comprising: associating storage on a first set of blades with a write group; andwherein reconfiguring the storage system further comprises associating, in dependence upon a write group formation policy, storage on a second set of blades with the write group.
4. The method of claim 1 wherein: detecting the change to the topology of the storage system further comprises detecting that an amount of processing resources within the storage system has changed; andreconfiguring the storage system further comprises increasing or decreasing an amount of processing resources allocated to one or more authorities.
5. The method of claim 1 wherein: detecting the change to the topology of the storage system further comprises detecting that an amount of storage resources within the storage system has changed; andreconfiguring the storage system further comprises increasing or decreasing an amount of storage associated with one or more write groups.
6. The method of claim 1 wherein detecting the change to the topology of the storage system further comprises detecting that a utilization of a particular resource has reached a utilization threshold.
7. The method of claim 1 wherein an authority executing on compute resources within a first blade causes data to he stored on compute resources within a second blade.
8. An apparatus for dynamically configuring a storage system to facilitate independent scaling of resources, the apparatus including a computer processor and a computer memory, the computer memory including computer program instructions that, when executed by the computer processor, cause the apparatus to carry out the steps of: detecting a change to a topology of the storage system consisting of different sets of blades configured within one of a plurality of chassis, at least one blade of the different sets of blades having differing storage capacities, the change to the topology including a change in the number of blades in the storage system; andresponsive to detecting the change to the topology of the storage system, reconfiguring the storage system to change an allocation of resources to one or more authorities without replicating any of the one or more authorities, each of the one or more authorities owning a segment of data, wherein the one or more authorities are configured to be identified through a hashing operation performed on the segment of data.
9. The apparatus of claim 8 further comprising computer program instructions that, when executed by the computer processor, cause the apparatus to carry out the step of: executing an authority on a first set of blades, the authority having control for a range of inode numbers; andwherein reconfiguring the storage system further comprises executing the authority on a second set of blades.
10. The apparatus of claim 8 further comprising computer program instructions that, when executed by the computer processor, cause the apparatus to carry out the step of: associating storage on a first set of blades with a write group; and.wherein reconfiguring the storage system further comprises associating, in dependence upon a write group formation policy, storage on a second set of blades with the write group.
11. The apparatus of claim 8 wherein: detecting the change to the topology of the storage system further comprises detecting that an amount of processing resources within the storage system has changed; andreconfiguring the storage system farther comprises increasing or decreasing an amount of processing resources allocated to one or more authorities.
12. The apparatus of claim 8 wherein: detecting the change to the topology of the storage system further comprises detecting that an amount of storage resources within the storage system has changed; andreconfignring the storage system further comprises increasing or decreasing an amount of storage associated with one or more write groups.
13. The apparatus of claim 8 wherein detecting the change to the topology of the storage system further comprises detecting that a utilization of a particular resource has reached a utilization threshold.
14. The apparatus of claim 8 wherein an authority executing on compute resources within a first blade causes data to be stored on. compute resources within a second blade.
15. A computer program product for dynamically configuring a storage system to facilitate independent scaling of resources, the computer program product disposed on a non-transitory storage medium, the computer program product including computer program instructions that, when executed by a computer, cause the computer to carry out the steps of: detecting a change to a topology of the storage system consisting of different sets of blades configured within one of a plurality of chassis, at least one blade of the different sets of blades having differing storage capacities, the change to the topology including a change in the number of blades in the storage system; andresponsive to detecting the change to the topology of the storage system, reconfiguring the storage system to change an allocation of resources to one or more authorities without replicating. any of the one or more authorities, each of the one or more authorities owning a segment of data, wherein the one or more authorities are configured to be identified through a hashing operation performed on the segment of data.
16. The computer program product of claim of 15 further comprising computer program instructions that, when executed by the computer, cause the computer to carry out the step of: executing an authority on a first set of blades, the authority having control for a range of inode numbers; andwherein reconfiguring the storage system further comprises executing the authority on a second set of blades.
17. The computer program product of claim of 15 further comprising computer program instructions that, when executed by the computer, cause the computer to carry out the step of: associating storage on a first set of blades with a write group; andwherein reconfiguring the storage system further comprises associating, in dependence upon a write group formation policy, storage on a second set of blades with the write group.
18. The computer program product of claim of 15 wherein: detecting the change to the topology of the storage system firther comprises detecting that an amount of processing resources within the storage system has changed; andreconfiguring the storage system further comprises increasing or decreasing an amount of processing resources allocated to one or more authorities.
19. The computer program product of claim of 15 wherein: detecting the change to the topology of the storage system further comprises detecting that an amount of storage resources within the storage system has changed; andreconfiguring the storage system further comprises increasing or decreasing an amount of storage associated with one or more write groups.
20. The computer program product of claim of 15 wherein detecting the change to the topology of the storage system further comprises detecting that a utilization of a particular resource has reached a utilization threshold.

US Referenced Citations (272)

Number	Name	Date	Kind
5390327	Lubbers et al.	Feb 1995	A
5479653	Jones	Dec 1995	A
5706210	Kumano et al.	Jan 1998	A
5799200	Brant et al.	Aug 1998	A
5933598	Scales et al.	Aug 1999	A
6012032	Donovan et al.	Jan 2000	A
6085333	DeKoning et al.	Jul 2000	A
6275898	DeKoning	Aug 2001	B1
6643641	Snyder	Nov 2003	B1
6643748	Wieland	Nov 2003	B1
6647514	Umberger et al.	Nov 2003	B1
6725392	Frey et al.	Apr 2004	B1
6789162	Talagala et al.	Sep 2004	B1
6985995	Holland et al.	Jan 2006	B2
7069383	Yamamoto et al.	Jun 2006	B2
7076606	Orsley	Jul 2006	B2
7089272	Garthwaite et al.	Aug 2006	B1
7107389	Inagaki et al.	Sep 2006	B2
7146521	Nguyen	Dec 2006	B1
7159150	Kenchammana-Hosekote	Jan 2007	B2
7162575	Dalal et al.	Jan 2007	B2
7334124	Pham et al.	Feb 2008	B2
7334156	Land et al.	Feb 2008	B2
7437530	Rajan	Oct 2008	B1
7493424	Bali et al.	Feb 2009	B1
7558859	Kasiolas	Jul 2009	B2
7613947	Coatney	Nov 2009	B1
7669029	Mishra et al.	Feb 2010	B1
7681104	Sim-Tang et al.	Mar 2010	B1
7681105	Sim-Tang et al.	Mar 2010	B1
7689609	Lango et al.	Mar 2010	B2
7730258	Smith	Jun 2010	B1
7743191	Liao	Jun 2010	B1
7743276	Jacobsen et al.	Jun 2010	B2
7757038	Kitahara	Jul 2010	B2
7778960	Chatterjee et al.	Aug 2010	B1
7814272	Barrall et al.	Oct 2010	B2
7814273	Barrall	Oct 2010	B2
7818531	Barrall	Oct 2010	B2
7827351	Suetsugu et al.	Nov 2010	B2
7827439	Matthew et al.	Nov 2010	B2
7870105	Arakawa et al.	Jan 2011	B2
7885938	Greene et al.	Feb 2011	B1
7886111	Klemm et al.	Feb 2011	B2
7899780	Shmuylovich et al.	Mar 2011	B1
7908448	Chatterjee et al.	Mar 2011	B1
7941697	Mathew et al.	May 2011	B2
7958303	Shuster	Jun 2011	B2
7971129	Watson	Jun 2011	B2
8010485	Chatterjee et al.	Aug 2011	B1
8010829	Chatterjee et al.	Aug 2011	B1
8020047	Courtney	Sep 2011	B2
8042163	Karr et al.	Oct 2011	B1
8046548	Chatterjee et al.	Oct 2011	B1
8051361	Sim-Tang et al.	Nov 2011	B2
8051362	Li et al.	Nov 2011	B2
8082393	Galloway et al.	Dec 2011	B2
8086585	Brashers et al.	Dec 2011	B1
8086634	Mimatsu	Dec 2011	B2
8086911	Taylor	Dec 2011	B1
8108502	Tabbara et al.	Jan 2012	B2
8117388	Jernigan, IV	Feb 2012	B2
8145838	Miller et al.	Mar 2012	B1
8145840	Koul et al.	Mar 2012	B2
8176360	Frost et al.	May 2012	B2
8244999	Chatterjee et al.	Aug 2012	B1
8271700	Annem et al.	Sep 2012	B1
8305811	Jeon	Nov 2012	B2
8315999	Chatley et al.	Nov 2012	B2
8327080	Der	Dec 2012	B1
8387136	Lee et al.	Feb 2013	B2
8402152	Duran	Mar 2013	B2
8412880	Leibowitz et al.	Apr 2013	B2
8437189	Montierth et al.	May 2013	B1
8465332	Hogan et al.	Jun 2013	B2
8473778	Simitci	Jun 2013	B2
8479037	Chatterjee et al.	Jul 2013	B1
8498967	Chatterjee et al.	Jul 2013	B1
8522073	Cohen	Aug 2013	B2
8527544	Colgrove et al.	Sep 2013	B1
8533527	Daikokuya et al.	Sep 2013	B2
8566546	Marshak et al.	Oct 2013	B1
8578442	Banerjee	Nov 2013	B1
8595455	Chatterjee et al.	Nov 2013	B2
8613066	Brezinski et al.	Dec 2013	B1
8620970	English et al.	Dec 2013	B2
8627136	Shankar et al.	Jan 2014	B2
8627138	Clark	Jan 2014	B1
8660131	Vermunt et al.	Feb 2014	B2
8700875	Barron et al.	Apr 2014	B1
8713405	Healey et al.	Apr 2014	B2
8751463	Chamness	Jun 2014	B1
8762642	Bates et al.	Jun 2014	B2
8762793	Grube et al.	Jun 2014	B2
8769622	Chang et al.	Jul 2014	B2
8775858	Gower et al.	Jul 2014	B2
8775868	Colgrove et al.	Jul 2014	B2
8788913	Xin et al.	Jul 2014	B1
8799746	Baker et al.	Aug 2014	B2
8800009	Beda, III et al.	Aug 2014	B1
8812860	Bray	Aug 2014	B1
8819383	Jobanputra et al.	Aug 2014	B1
8843700	Salessi et al.	Sep 2014	B1
8850108	Hayes	Sep 2014	B1
8850288	Lazier et al.	Sep 2014	B1
8850546	Field et al.	Sep 2014	B1
8856593	Eckhardt et al.	Oct 2014	B2
8862847	Feng et al.	Oct 2014	B2
8868825	Hayes	Oct 2014	B1
8874836	Hayes	Oct 2014	B1
8898346	Simmons	Nov 2014	B1
8898383	Yamamoto et al.	Nov 2014	B2
8898388	Kimmel	Nov 2014	B1
8909854	Yamagishi et al.	Dec 2014	B2
8918478	Ozzie et al.	Dec 2014	B2
8931041	Banerjee	Jan 2015	B1
8949863	Coatney et al.	Feb 2015	B1
8984602	Bailey et al.	Mar 2015	B1
8990905	Bailey et al.	Mar 2015	B1
9021053	Bernbo et al.	Apr 2015	B2
9025393	Wu	May 2015	B2
9053808	Sprouse	Jun 2015	B2
9058155	Cepulis et al.	Jun 2015	B2
9116819	Cope et al.	Aug 2015	B2
9117536	Yoon	Aug 2015	B2
9124569	Hussain et al.	Sep 2015	B2
9134922	Rajagopal et al.	Sep 2015	B2
9201733	Verma	Dec 2015	B2
9209973	Aikas et al.	Dec 2015	B2
9250687	Aswadhati	Feb 2016	B1
9250823	Kamat et al.	Feb 2016	B1
9300660	Borowiec et al.	Mar 2016	B1
9444822	Borowiec et al.	Sep 2016	B1
9507532	Colgrove et al.	Nov 2016	B1
9684531	Jacobs	Jun 2017	B2
9829066	Thomas et al.	Nov 2017	B2
20020013802	Mori et al.	Jan 2002	A1
20030145172	Galbraith et al.	Jul 2003	A1
20030191783	Wolczko et al.	Oct 2003	A1
20030225961	Chow et al.	Dec 2003	A1
20040015581	Forbes	Jan 2004	A1
20040080985	Chang et al.	Apr 2004	A1
20040111573	Garthwaite	Jun 2004	A1
20040153844	Ghose et al.	Aug 2004	A1
20040193814	Erickson et al.	Sep 2004	A1
20040260967	Guha et al.	Dec 2004	A1
20050039183	Romero	Feb 2005	A1
20050160416	Jamison	Jul 2005	A1
20050188246	Emberty et al.	Aug 2005	A1
20050216800	Bicknell et al.	Sep 2005	A1
20060015771	Vana Gundy et al.	Jan 2006	A1
20060129817	Borneman et al.	Jun 2006	A1
20060161726	Lasser	Jul 2006	A1
20060174157	Barrall et al.	Aug 2006	A1
20060184760	Fujibayashi et al.	Aug 2006	A1
20060230245	Gounares et al.	Oct 2006	A1
20060239075	Williams et al.	Oct 2006	A1
20070022227	Miki	Jan 2007	A1
20070028068	Golding et al.	Feb 2007	A1
20070055702	Fridella et al.	Mar 2007	A1
20070079068	Draggon	Apr 2007	A1
20070109856	Pellicone et al.	May 2007	A1
20070150689	Pandit et al.	Jun 2007	A1
20070168321	Saito et al.	Jul 2007	A1
20070214194	Reuter	Sep 2007	A1
20070214314	Reuter	Sep 2007	A1
20070220227	Long	Sep 2007	A1
20070268905	Baker et al.	Nov 2007	A1
20070294563	Bose	Dec 2007	A1
20070294564	Reddin et al.	Dec 2007	A1
20080005587	Ahlquist	Jan 2008	A1
20080077825	Bello et al.	Mar 2008	A1
20080155191	Anderson et al.	Jun 2008	A1
20080162674	Dahiya	Jul 2008	A1
20080195833	Park	Aug 2008	A1
20080270678	Cornwell et al.	Oct 2008	A1
20080282045	Biswas et al.	Nov 2008	A1
20080295118	Liao	Nov 2008	A1
20090077340	Johnson et al.	Mar 2009	A1
20090100115	Park et al.	Apr 2009	A1
20090198889	Ito et al.	Aug 2009	A1
20100052625	Cagno et al.	Mar 2010	A1
20100211723	Mukaida	Aug 2010	A1
20100246266	Park et al.	Sep 2010	A1
20100257142	Murphy et al.	Oct 2010	A1
20100262764	Liu et al.	Oct 2010	A1
20100268908	Ouyang et al.	Oct 2010	A1
20100325345	Ohno et al.	Dec 2010	A1
20100332754	Lai et al.	Dec 2010	A1
20110022812	van der Linden	Jan 2011	A1
20110035540	Fitzgerald et al.	Feb 2011	A1
20110072290	Davis et al.	Mar 2011	A1
20110125955	Chen	May 2011	A1
20110131231	Haas et al.	Jun 2011	A1
20110167221	Pangal et al.	Jul 2011	A1
20110302369	Goto et al.	Dec 2011	A1
20120023144	Rub	Jan 2012	A1
20120054264	Haugh et al.	Mar 2012	A1
20120079318	Colgrove et al.	Mar 2012	A1
20120131253	McKnight et al.	May 2012	A1
20120198152	Terry et al.	Aug 2012	A1
20120226934	Rao	Sep 2012	A1
20120303675	Antani	Nov 2012	A1
20120303919	Hu et al.	Nov 2012	A1
20120311000	Post et al.	Dec 2012	A1
20120311557	Resch	Dec 2012	A1
20130007845	Chang et al.	Jan 2013	A1
20130031414	Dhuse et al.	Jan 2013	A1
20130036272	Nelson	Feb 2013	A1
20130060884	Bernbo et al.	Mar 2013	A1
20130071087	Motiwala et al.	Mar 2013	A1
20130073894	Xavier et al.	Mar 2013	A1
20130124776	Hallak et al.	May 2013	A1
20130145447	Maron	Jun 2013	A1
20130151771	Tsukahara et al.	Jun 2013	A1
20130191555	Liu	Jul 2013	A1
20130198459	Joshi et al.	Aug 2013	A1
20130205173	Yoneda	Aug 2013	A1
20130219164	Hamid	Aug 2013	A1
20130227201	Talagala et al.	Aug 2013	A1
20130290607	Chang et al.	Oct 2013	A1
20130311434	Jones	Nov 2013	A1
20130318297	Jibbe et al.	Nov 2013	A1
20130332614	Brunk et al.	Dec 2013	A1
20140020083	Fetik	Jan 2014	A1
20140040702	He et al.	Feb 2014	A1
20140046908	Patiejunas	Feb 2014	A1
20140047263	Coatney et al.	Feb 2014	A1
20140068224	Fan et al.	Mar 2014	A1
20140074850	Noel et al.	Mar 2014	A1
20140082715	Grajek et al.	Mar 2014	A1
20140086146	Kim et al.	Mar 2014	A1
20140090009	Li et al.	Mar 2014	A1
20140096220	Da Cruz Pinto et al.	Apr 2014	A1
20140101434	Senthurpandi et al.	Apr 2014	A1
20140136880	Shankar et al.	May 2014	A1
20140164774	Nord et al.	Jun 2014	A1
20140173232	Reohr et al.	Jun 2014	A1
20140195636	Karve et al.	Jul 2014	A1
20140201512	Seethaler et al.	Jul 2014	A1
20140201541	Paul et al.	Jul 2014	A1
20140208155	Pan	Jul 2014	A1
20140215590	Brand	Jul 2014	A1
20140229654	Goss et al.	Aug 2014	A1
20140230017	Saib	Aug 2014	A1
20140258526	Le Sant et al.	Sep 2014	A1
20140282983	Ju et al.	Sep 2014	A1
20140285917	Cudak et al.	Sep 2014	A1
20140325262	Cooper et al.	Oct 2014	A1
20140351627	Best et al.	Nov 2014	A1
20140373104	Gaddam et al.	Dec 2014	A1
20140373126	Hussain et al.	Dec 2014	A1
20140380125	Calder et al.	Dec 2014	A1
20140380126	Yekhanin et al.	Dec 2014	A1
20150026387	Sheredy et al.	Jan 2015	A1
20150074463	Jacoby et al.	Mar 2015	A1
20150089569	Sondhi et al.	Mar 2015	A1
20150095515	Krithivas et al.	Apr 2015	A1
20150113203	Dancho et al.	Apr 2015	A1
20150121137	McKnight et al.	Apr 2015	A1
20150013492	Anderson et al.	May 2015	A1
20150149822	Coronado et al.	May 2015	A1
20150193169	Sundaram et al.	Jul 2015	A1
20150280959	Vincent	Oct 2015	A1
20150355970	Hayes et al.	Dec 2015	A1
20150362968	Jurey et al.	Dec 2015	A1
20150378888	Zhang et al.	Dec 2015	A1
20160098323	Mutha et al.	Apr 2016	A1
20160350009	Cerreta et al.	Dec 2016	A1
20160352720	Hu et al.	Dec 2016	A1
20160352830	Borowiec et al.	Dec 2016	A1
20160352834	Borowiec et al.	Dec 2016	A1

Foreign Referenced Citations (13)

Number	Date	Country
0725324	Aug 1996	EP
2164006	Mar 2010	EP
WO-02-13033	Feb 2002	WO
WO-2008103569	Aug 2008	WO
WO-2012087648	Jun 2012	WO
WO2013071087	May 2013	WO
WO-2014110137	Jul 2014	WO
WO-2015188014	Dec 2015	WO
WO-2016015008	Dec 2016	WO
WO-2016190938	Dec 2016	WO
WO-2016195759	Dec 2016	WO
WO-2016195958	Dec 2016	WO
WO-2016195961	Dec 2016	WO

Non-Patent Literature Citations (42)

Entry
Kong, Using PCI Express as the Primary System Interconnect in Multiroot Compute, Communications and Embedded Systems, White Paper, IDT.com, 2008 [<URL://https://www.renesas.com/us/en/document/whp/idt-pcie-multi-root-white-paper>] (Year: 2008).
Paul Sweere, Creating Storage Class Persistent Memory with NVDIMM, Published in Aug. 2013, Flash Memory Summit 2013, <http://ww.flashmemorysummit.com/English/Collaterals/Proceedings/2013/20130814_T2_Sweere.pdf>, 22 pages.
PCMag. “Storage Array Definition”. Published May 10, 2013. <http://web.archive.org/web/20130510121646/http://www.pcmag.com/encyclopedia/term/52091/storage-array>, 2 pages.
Google Search of “Storage array Define” on Nov. 4, 2015 for U.S. Appl. No. 14/725,278, Results limited to entries dated before 2012, 1 page.
Techopedia. “What is a disk Array”. Published Jan. 13, 2012. <http://web.archive.org/web/20120113053358/http://www.techopedia.com/definition/1009/disk-array>, 1 page.
Webopedia. “What is a disk Array”. Published May 26, 2011. <http://web/archive.org/web/20110526081214/http://www.webopedia.com/TERM/D/disk_array.html>, 2 pages.
Li et al., Access Control for the Services Oriented Architecture, Proceedings of the 2007 ACM Workshop on Secure Web Services (SWS '07), Nov. 2007, pp. 9-17, ACM New York, NY.
The International Search Report and the Written Opinion received from the International Searching Authority (ISA/EPO) for International Application No. PCT/US2016/015006, dated Apr. 29, 2016, 12 pages.
The International Search Report and the Written Opinion received from the International Searching Authority (ISA/EPO) for International Application No. PCT/US2016/015008, dated May 4, 2016, 12 pages.
C. Hota et al., Capability-based Cryptographic Data Access Control in Cloud Computing, Int. J. Advanced Networking and Applications, col. 1, Issue 1, dated Aug. 2011, 10 pages.
The International Search Report and the Written Opinion received from the International Searching Authority (ISA/EPO) for International Application No. PCT/US2016/020410, dated Jul. 8, 2016, 17 pages.
The International Search Report and the Written Opinion recieved from the International Searching Authority (ISA/EPO) for International Application No. PCT/US2016/032084, dated Jul. 18, 2016, 12 pages.
Faith, “Dictzip file Format”, GitHub.com (online). [Accessed Jul. 28, 2015], 1 page, URL: https://github.com/fidlej/idzip.
Wikipedia, “Convergent Encryption”, Wikipedia.org (online), accessed Sep. 8, 2015, 2 pages, URL: en.wikipedia.org/wiki/Convergent_encryption.
Storer et al., “Secure Data Deduplication”, Proceedings of the 4th ACM International Workshop on Storage Security and Survivability (StorageSS'08), Oct. 2008, 10 pages, ACM New York, NY. USA. DOI: 10.1145/1456469.1456471.
The International Search Report and the Written Opinion recieved from the International Searching Authority (ISA/EPO) for International Application No. PCT/US2016/016333, dated Jun. 8, 2016, 12 pages.
ETSI, Network Function Virtualisation (NFV); Resiliency Requirements, ETSI GS NFCV-REL 001, V1.1.1, http://www.etsi.org/deliver/etsi_gs/NFV-REL/001_099/001/01.01.01_60/gs_NFV-REL001v010101p.pdf (online), dated Jan. 2015, 82 pages.
The International Search Report and the Written Opinion received from the International Searching Authority (ISA/EPO) for International Application No. PCT/US2016/032052, dated Aug. 30, 2016, 17 pages.
Microsoft, “Hybrid for SharePoint Server 2013—Security Reference Architecture”, <http://hybrid.office.com/img/Security_Reference_Architecture.pdf> (online), dated Oct. 2014, 53 pages.
Microsoft, “Hybrid Identity”, <http://aka.ms/HybridIdentityWp> (online), dated Apr. 2014, 36 pages.
Microsoft, “Hybrid Identity Management”, <http://download.microsoft.com/download/E/A/E/EAE57CD1-A80B-423C-96BB-142FAAC630B9/Hybrid_Identity_Datasheet.pdf> (online), publised Apr. 2014, 17 pages.
Jacob Bellamy-McIntyre et al., “OpenID and the Enterprise: A Model-based Analysis of Single Sign-on Authentication”, 2011 15th IEEE International Enterprise Distributed Object Computing Conference (EDOC), DOI: 10.1109/EDOC.2011.26, ISBN: 978-1-4577-0362-1, <https://www.cs.ackland.ac.nz/˜lutteroth/publications/McIntyreLutterothWeber2011-OpenID.pdf> (online), dated Aug. 29, 2011, 10 pages.
The International Search Report and the Written Opinion received from the International Searching Authority (ISA/EPO) for International Application No. PCT/US2016/035492, dated Aug. 17, 2016, 10 pages.
Hu et al., Container Marking: Combining Data Placement, Garbage Collection and Wear Levelling for Flash, 19th Annual IEEE International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunications Systems, Jul. 25-27, 2011, 11 pages, ISBN: 978-0-7695-4430-4, DOI: 10.1109/MASCOTS.2011.50.
International Search Report and Written Opinion, PCT/US2016/036693, dated Aug. 29, 2016.
International Search Report and Written Opinion, PCT/US2016/038758, dated Oct. 7, 2016.
International Search Report and Written Opinion, PCT/US2016/040393, dated Sep. 22, 2016.
International Search Report and Written Opinion, PCT/US2016/044020, dated Sep. 30, 2016.
International Search Report and Written Opinion, PCT/US2016/044874, dated Oct. 7, 2016.
International Search Report and Written Opinion, PCT/US2016/044875, dated Oct. 5, 2016.
International Search Report and Written Opinion, PCT/US2016/044876, dated Oct. 21, 2016.
International Search Report and Written Opinion, PCT/US2016/044877, dated Sep. 29, 2016.
Zhang et al., Application-Aware and Software-Defined SSD Scheme for Tencent Large-Scale Storage System, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems, Dec. 2016, pp. 482-490, Institute of Electrical and Electronics Engineers (IEEE) Computer Society, Digital Object Identifier: 10.1109/ICPADS.2016.0071, USA.
Bjørling, OpenChannel Solid State Drives NVMe Speciftcation, Revision 1.2, Apr. 2016, 24 pages, LightNVM.io (online), URL: http://lightnvm.io/docs/Open-ChannelSSDInterfaceSpecification12-final.pdf.
International Search Report and Written Opinion, PCT/US2017/012539, dated Apr. 19, 2017.
Storer et al., Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage, Fast '08: 6th USENIX Conference on File and Storage Technologies, Feb. 2008 pp. 1-16, San Jose, CA.
Hwang et al., RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing, Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing (HPDC '00), Aug. 2000, pp. 279-286, IEEE, USA.
International Search Report and Written Opinion, PCT/US2015/018169, dated May 15, 2015.
International Search Report and Written Opinion, PCT/US2015/034302, dated Sep. 11, 2015.
International Search Report, PCT/US2015/034291, dated Sep. 30, 2015.
International Search Report, PCT/US2015/044370, dated Dec. 15, 2015.
International Search Report, PCT/US2015/014604, dated Jul. 2, 2015.

Related Publications (1)

	Number	Date	Country
	20170337002 A1	Nov 2017	US

Dynamically configuring a storage system to facilitate independent scaling of resources

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications