The subject matter of this disclosure is generally related to data storage systems that utilize multi-host Peripheral Component Interconnect Express (PCI-E) devices.
Network-attached storage (NAS), Storage Area Networks (SANs), and other types of organizational storage systems can be configured to maintain logical storage objects containing data used by instances of host applications running on host servers. Examples of host applications may include, but are not limited to, multi-user software for email, accounting, inventory control, manufacturing, engineering, and a wide variety of other organizational functions. The storage objects are abstractions of space on physical storage drives. Each storage object includes a contiguous range of logical block addresses (LBAs) at which blocks of host application data can be stored and accessed using input-output commands (IOs). A single storage array can simultaneously support IOs to storage objects from multiple instances of multiple host applications. PCI-E is an interface standard for connecting high-speed components such as redundant array of independent drives (RAID) cards and solid-state drives (SSDs) with central processing units (CPUs), which are referred to as “hosts” in the PCI-E standard. Multi-host PCI-E devices communicate with multiple hosts using multiple PCI-E links.
The presently disclosed concepts are predicated in part on recognition that multi-host support can create potential data consistency problems in a storage system because shared memory addresses can be accessed via different PCI-E links.
A method in accordance with some implementations comprises: mapping individual addresses of a shared memory that is accessible by one or more central processing unit (CPU) hosts to one of a plurality of Peripheral Component Interconnect Express (PCI-E) links via which the shared memory is accessible; and responding to a process call by identifying the PCI-E link to which a specified address in the shared memory is mapped.
An apparatus in accordance with some implementations comprises: a plurality of non-volatile drives; and a plurality of compute nodes configured to manage access to the drives, each of the compute nodes comprising local memory that is shared with other ones of the compute nodes and a link selection controller configured to map individual addresses of the shared memory to one of a plurality of Peripheral Component Interconnect Express (PCI-E) links via which the shared memory is accessible and identify the PCI-E link to which a specified address in the shared memory is mapped in response to a process call.
A non-transitory computer-readable storage medium in accordance with some implementations stores instructions that are executed by a storage system to perform a method comprising: mapping individual addresses of a shared memory that is accessible by multiple central processing unit (CPU) hosts to one of a plurality of Peripheral Component Interconnect Express (PCI-E) links via which the shared memory is accessible; and responding to a process call by identifying the PCI-E link to which a specified address in the shared memory is mapped.
This summary is not intended to limit the scope of the claims or the disclosure. Other aspects, features, and implementations will become apparent in view of the detailed description and figures. Moreover, all the examples, aspects, implementations, and features can be combined in any technically possible way.
The terminology used in this disclosure is intended to be interpreted broadly within the limits of subject matter eligibility. The terms “disk,” “drive,” and “disk drive” are used interchangeably to refer to non-volatile storage media and are not intended to refer to any specific type of non-volatile storage media. The terms “logical” and “virtual” are used to refer to features that are abstractions of other features, for example, and without limitation, abstractions of tangible features. The term “physical” is used to refer to tangible features that possibly include, but are not limited to, electronic hardware. For example, multiple virtual computers could operate simultaneously on one physical computer. The term “logic” is used to refer to special purpose physical circuit elements, firmware, software, computer instructions that are stored on a non-transitory computer-readable medium and implemented by multi-purpose tangible processors, and any combinations thereof. Aspects of the inventive concepts are described as being implemented in a data storage system that includes host servers and a storage array. Such implementations should not be viewed as limiting. Those of ordinary skill in the art will recognize that there are a wide variety of implementations of the inventive concepts in view of the teachings of the present disclosure.
Some aspects, features, and implementations described herein may include machines such as computers, electronic components, optical components, and processes such as computer-implemented procedures and steps. It will be apparent to those of ordinary skill in the art that the computer-implemented procedures and steps may be stored as computer-executable instructions on a non-transitory computer-readable medium. Furthermore, it will be understood by those of ordinary skill in the art that the computer-executable instructions may be executed on a variety of tangible processor devices, i.e., physical hardware. For practical reasons, not every step, device, and component that may be part of a computer or data storage system is described herein. Those of ordinary skill in the art will recognize such steps, devices, and components in view of the teachings of the present disclosure and the knowledge generally available to those of ordinary skill in the art. The corresponding machines and processes are therefore enabled and within the scope of the disclosure.
A state change detection component 300 monitors for hardware failures and analyzes detected hardware failures to determine if one of the PCI-E links has been impacted. If availability of a PCI-E link has changed, then the failure detection component 300 initiates a failover process. A message handler component 302 responds to the detected PCI-E link availability failure by sending out a high priority message over redundant networks to all link selection controllers of all CPUs in storage system, e.g., link selection controllers 192, 193, 194. Once the high-priority message has been received from the message handler component, the message handler component signals to cause the mapping component 304 to update the affected address segments so that they no longer map to the failed PCI-E link. The other link selection controllers 192, 193, 194 similarly cause their mapping components to update the affected address segments so that they no longer map to the failed PCI-E link.
Specific examples have been presented to provide context and convey inventive concepts. The specific examples are not to be considered as limiting. A wide variety of modifications may be made without departing from the scope of the inventive concepts described herein. Moreover, the features, aspects, and implementations described herein may be combined in any technically possible way. Accordingly, modifications and combinations are within the scope of the following claims.
Number | Name | Date | Kind |
---|---|---|---|
9785597 | Bayer | Oct 2017 | B2 |
10466935 | Totolos | Nov 2019 | B2 |
20030093604 | Lee | May 2003 | A1 |
Number | Date | Country | |
---|---|---|---|
20240184726 A1 | Jun 2024 | US |