Some computing systems use memory systems comprising a plurality of interconnected memory components. The memory components may be distributed to different locations, with some memory components being located close to the computing systems and some other memory components being located at remote locations, or co-located in various numbers, as desired.
The following detailed description references the drawings, wherein:
The following detailed description refers to the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the following description to refer to the same or similar parts. While several examples are described in this document, modifications, adaptations, and other implementations are possible. Accordingly, the following detailed description does not limit the disclosed examples. Instead, the proper scope of the disclosed examples may be defined by the appended claims.
As mentioned above, some computing systems use memory systems comprising a plurality of interconnected memory components. The memory components may be distributed to different locations, with some memory components being located close to the computing systems and some other memory components being located at remote locations, or co-located in various numbers, as desired.
Memory systems are being developed which comprise a plurality of inter-connected memory components whose individual memory address spaces are aggregated and exposed (e.g. to processors/computing modules, and/or other memory components)—through entry points acting similarly to gateways in computer networks—as if the whole network of memory components were but a single memory component having a uniform memory space. As used herein, a memory fabric may be comprise such a network of memory components.
In such memory fabrics, the use of optical interconnects to connect memory components to one another increases the speed of signal transmission between components and makes it feasible to manage a group of memory components as a single memory resource even in cases where the group comprises a high number of memory components distributed over a large physical space. Thus, for example, the memory fabric could extend over plural racks in a data centre, over plural data centers, etc.
A memory fabric may treat memory as if it were a routable resource (treating memory addresses somewhat in the way that IP networks handle IP addresses). The memory fabric handles memory traffic (e.g., items routed over the memory fabric). Memory traffic may comprise, for example,: memory access requests and other relevant messages/information to facilitate access, allocation, configuration and the like of the memory fabric, as well as data being read from/written to the memory components of the memory fabric.
A memory component may receive requests for read or write access, may route a request to other memory components, may access memory addresses in response to a read or write request, and/or may otherwise facilitate management, communication, or storage of data. For example, responsive to a memory request (e.g., a read or write access request) being requested to a memory address in a memory fabric, a memory component may transmit a request to make a memory access along a path between one or more other memory components to a target memory component, and the target memory component (responsible for the memory address targeted in the request) may access the correct memory address
The memory components of the memory fabric may implement a routing protocol to determine the physical links that are used to route a memory request over the memory fabric to the target memory component. The memory fabric may perform steps of the fabric routing protocol to establish and use routing tables that specify which route to use to transmit a request from the memory component towards a particular destination point in the fabric. The route may, for example, be specified in terms of an output port which the memory component should use to forward the memory request towards the target memory component.
Physical link failures may occur when a memory component of the memory fabric fails or a physical link connecting two memory components itself fails because of a software error, a hardware problem, or a link disconnection. Failures may occur for a variety of reasons, including bursts of traffic that cause a high degree of loss of memory-addressing requests or high, variable latencies. Software applications that access a memory fabric may perceive failures as either outages or performance failures.
Certain memory fabrics may implement fabric routing protocols that are based on the assumption that there is only a single route to transmit a request from one particular memory component to another memory component in the fabric. This may be a valid assumption in the case of a small, static and/or carefully designed fabric. In such a case, the fabric routing protocol may cause the memory components to hold details of only one route to each potential destination. However, if a problem (outage, performance failure) arises somewhere along the single route designated in the routing table, then it may become impossible to transmit a memory-addressing request to its intended destination. Memory fabrics of this type are not resilient in the face of outages and performance failures.
Furthermore, certain memory fabrics may be large (i.e. they may involve a large number of memory components) and/or they may have a topology that does not result from conscious design (for example because memory components can join/leave the fabric in an ad hoc manner). As a result there may be a plurality of routes available for transmission of a request from one point to another, in particular as the size of the memory fabric increases. However, bandwidth issues and latency issues may occur responsive to a memory component (or the memory fabric itself) trying to determine alternative paths after a failure occurs. Further, some memory fabric routing protocols may include mechanisms which inhibit search or adoption of alternative paths after a failure has been discovered until the failure persists for a predetermined time period, in order to enhance the stability of routing with a large memory fabric.
To address the technical challenges of maintaining a resilient memory fabric in situations of link or memory component failure, each memory component in the memory fabric may comprise a local non-transitory machine readable storage medium that stores a set of labeled routes for the other memory components in the memory fabric. The set of labeled routes may comprise multiple routes to a particular memory component, where a first labeled route may be indicated as a primary route and the other labeled routes may be indicated as alternative routes.
Each memory component may pre-determine a set of routes to other memory components in the memory fabric. For example, a memory component may determine a set of routes to the other memory components and may label the determined set of routes. A route may comprise, for example, an ordered series of memory components and corresponding output ports. The label may comprise, for example, an identification of a route, an identification of the destination memory component of the route, a cost metric of the route, a number of memory components in the route, and/or other information related to the route. The memory component may determine and label the set of routes by performing routing to each of the other memory components, determining a cost metric for each route, and assigning a label to the route with the lowest determined cost metric.
Responsive to determining and labeling the set of routes, the memory component may then send information about the labeled routes to its neighbor memory components, and may receive information about labeled routes from each of its neighbors. The memory component may revise its labeled routes based on the received information by determining if a labeled route to a destination memory component received from a neighbor has a lower cost metric than its labeled route to that destination memory component. The previously labeled route may be maintained in storage as an alternative route to that destination memory component.
Referring now to the drawings,
A memory component (e.g., first memory component 100) may comprise a non-transitory machine-readable storage medium 120, a processor 110, and a first address space 140 of memory. Each memory component 100, 101, 102, . . . , 10n may comprise similar or the same hardware and perform the same functionality as described below in conjunction with memory component 100.
Processor 110 may be one or more central processing units (CPUs), graphic processing units (GPUs), digital signal processors (DSPs), microprocessors, and/or other hardware devices suitable for retrieval and execution of instructions stored in machine-readable storage medium 120. Processor 110 may fetch, decode, and execute program instructions 121, and/or other instructions to enable management of a resilient memory fabric, as described below. As an alternative or in addition to retrieving and executing instructions, processor 110 may include one or more electronic circuits comprising a number of electronic components for performing the functionality of one or more of instructions 121, and/or other instructions.
In one example, the program instructions 121, and/or other instructions can be part of an installation package that can be executed by processor 110 to implement the functionality described herein. In this case, memory 120 may be a portable medium such as a CD, DVD, or flash drive or a memory maintained by a computing device from which the installation package can be downloaded and installed. In another example, the program instructions may be part of an application or applications already installed on memory fabric 10.
Non-transitory machine-readable storage medium 120 may be any hardware storage device for maintaining data accessible to memory fabric 10. For example, machine-readable storage medium 120 may include one or more hard disk drives, solid state drives, tape drives, memory fabrics, and/or any other storage devices. The storage devices may be located in memory fabric 10, may be located across disparate, geographically distributed devices, and/or in another device in communication with memory fabric 10. For example, machine-readable storage medium 120 may be any electronic, magnetic, optical, or other physical storage device that stores executable instructions. Thus, machine-readable storage medium 120 may be, for example, Random Access Memory (RAM), an Electrically-Erasable Programmable Read-Only Memory (EEPROM), a storage drive, an optical disc, universal memory, and the like. As described in detail below, machine-readable storage medium 120 may be encoded with executable instructions for management of a resilient memory fabric. As detailed below, storage medium 20 may maintain and/or store the data and information described herein.
For example, storage medium 120 may maintain and/or store data and information related to management of a resilient memory fabric. Storage medium 120 may store, for example, information about the memory components of the memory fabric 10. In some examples, the storage medium 120 may store information about each memory component of the memory fabric 10. For an individual memory component, the storage medium 120 may store, for example, an identification of the memory component, an indication of whether the memory component is a neighbour (e.g., directly connected to) the memory component 100, and/or other information related to the memory component.
Storage medium 120 may also store, for example, a set of labelled routes from the memory component 100 to other memory components (e.g., memory components 100, 101, 102, . . . , 10n) in the memory fabric 10. Information about a route may comprise, for example, an ordered series of memory components, information about links between the memory components, corresponding output ports, a cost metric associated with the route, and/or other information related to the route. The label may comprise, for example, an identification of a route, an identification of the destination memory component of the route, a cost metric of the route, a number of memory components in the route, and/or other information related to the route. In some examples, the label may also comprise an indication that a route is an alternative route (e.g., not a route to be selected for use by the memory component).
Data routing instructions 121, when executed by processor 110, may route data from the memory component 101 to a destination memory component. For example, the data routing instructions 121, when executed by processor 110, may route data along a selected labelled route. The data routing instructions 121, when executed by processor 110, may select the labelled route based on the destination memory component. For example, the data routing instructions 121, when executed by processor 110, may determine which label of the set of labelled routes stored in the non-transitory machine readable storage medium comprises an identification of the destination memory component. In some examples, the data routing instructions 121, when executed by processor 110, may determine that a plurality of labels comprise the identification of the destination memory component. In these examples, the data routing instructions 121, when executed by processor 110, may determine the labelled route via which to route data based on whether each of the plurality of labels also comprise an indication that the route is an alternative route. The data routing instructions 121, when executed by processor 110, may select the labelled route that does not include such an indication.
Like the memory components of
As with processor 110 of
Data routing instructions 221, when executed by processor 210, may route data from the memory component 200 to a destination memory component. In some examples, data routing instructions 221, when executed by processor 210, may perform functionality the same as or similar o data routing instructions 121, when executed by processor 110.
Route labelling instructions 222, when executed by processor 210, may determine a set of routes to other memory components in the memory fabric 10. For example, the route labelling instructions 222, when executed by processor 210, may perform routing to each of the other memory components and may determine, based on the performed routing, a cost metric for each route. The route labelling instructions 222, when executed by processor 210, may determine a cost metric for a route based on, for example, a latency of the route, a number of hops for the route, bandwidth for the route (and/or for memory components in the route), reliability of transmission across the route, any combination thereof, and/or other objective measures of successful transmission of data along the route.
The route labelling instructions 222, when executed by processor 210, may access the objective measures to be used to determine the cost metric from the non-transitory machine readable storage medium 220, from an administrator of the memory fabric 20, and/or from another source. In some examples, the objective measures may vary based on characteristics of the memory components or the memory fabric. In some examples, different sets of objective measures may be used for different types of memory components in a memory fabric 20.
Responsive to determining the routes and the respective cost metrics for each route, the route labelling instructions 222, when executed by processor 210, may store information about the routes in the non-transitory machine readable storage medium 120. Information about a route may comprise, for example, ordered series of memory components, information about links between the memory components, corresponding output ports, a cost metric associated with the route, and/or other information related to the route.
The route labelling instructions 222, when executed by processor 210, may label the determined routes. For example, route labelling instructions 222, when executed by processor 210, may label the determined routes based on the determined cost metric for each route. In some examples, the route labelling instructions, when executed by processor 210, may determine which route to a destination memory component has the best cost metric (e.g., lowest or highest cost metric based on the characteristics of the cost metric) and may label only that route. A label for each route may comprise, for example, an identification of a route, an identification of the destination memory component of the route, a cost metric of the route, a number of memory components in the route, and/or other information related to the route. In some examples, route labelling instructions 222, when executed by processor 210, may label each route stored in the non-transitory machine-readable storage medium. In these examples, the label may also comprise an indication that a route is an alternative route (e.g., not a route to be selected for use by the memory component). The route labelling instructions 222, when executed by processor 210, may store the set of labelled routes in the non-transitory machine readable storage medium 220.
Responsive to storing the set of labelled routes in the storage medium 220, the route labelling instructions 222, when executed by processor 210, may forward, to each neighbour component of the first memory component 200, information about the stored set of labelled routes. The forwarded information may comprise, for each labelled route, the label, the cost metric, the identification of the memory component, and/or other information related to the route. In some examples in which all routes are labelled and some routes have labels with an indication that they are alternative routes, the route labelling instructions 222, when executed by processor 210, may only forward those labelled routes that are not alternative routes.
The route labelling instructions 222, when executed by processor 210, may similarly receive, from each neighbour memory component, information about the respective neighbor's stored set of labelled routes. The route labelling instructions 222, when executed by processor 210, may revise its stored set of labelled routes based on the received information. For example, for each labelled route of a first neighbor's set of labelled routes, the route labelling instructions 222, when executed by processor 210, may determine, from the received information, whether a neighbor identification of a destination memory component of the labelled route matches a local identification of a destination memory component in a labelled route for that destination memory component stored in the storage medium 120.
Responsive to the neighbor identification matching the local identification, the route labelling instructions 222, when executed by processor 210, may determine whether a neighbor cost metric associated with the labelled route is better than the cost metric associated with the stored labelled route. Responsive to the neighbor cost metric being lower than the associated cost metric, the route labelling instructions 222, when executed by processor 210, may store the received labelled route from the neighbour memory component in the storage medium 220. The route labelling instructions 222, when executed by processor 210, may also remove the label of the stored route and label the stored neighbor label associated with the neighbor identification of the destination memory component. In some examples, instead of removing the label of the stored route, the route labelling instructions 222, when executed by processor 210, may revise the label of the stored route with an indication that the stored route is an alternative route.
The route labelling instructions 222, when executed by processor 210, may continue to forward and receive stored sets of labelled routes and revise the stored labelled routes of the memory components until no changes are made to the stored labelled routes responsive to receiving a stored set of labelled routes from other memory components. The route labelling instructions 222, when executed by processor 210, may forward and receive stored sets of labelled routes responsive to a cost metric for a stored route changing, responsive to receiving information from the memory fabric 20 (or a memory component thereof) to forward and receive stored sets of labelled routes, and/or in other situations where the labelled routes should be updated.
In some examples, each time the route labelling instructions 222, when executed by processor 210, may store and/or revise a labelled route, the route labelling instructions 222, when executed by processor 210, may send the stored and/or revised labelled route (or its entire set of labelled routes) to a central non-transitory storage medium of the memory fabric 20. In some examples, a single memory component or other hardware component of the memory fabric may comprise the central storage medium. In some examples, the storage medium may comprise the labelled routes from each of the memory components of the memory fabric 20. In some examples, information about each route available via the memory components of the memory fabric 20 may be stored at the central storage medium, with labels for the routes including indications as to whether the route is an alternative route.
Failure recovery instructions 223, when executed by processor 210, may facilitate recovery responsive to a failure in the memory fabric 20. For example, the failure recovery instructions 223, when executed by processor 220, may receive information about a memory component, link, and/or other component of the memory fabric 20 failing.
Responsive to determining that the failure involves a particular memory component, the failure recovery instructions 223, when executed by processor 220, may determine that a labeled route to a destination memory component in the set of labeled routes stored in the local non-transitory machine readable storage medium comprises the failed memory component. For example, the failure recovery instructions 223, when executed by processor 220, may determine whether each labelled route stored in the storage medium 220 (that does not have an indication of alternative route in the label) comprises the failed memory component. Responsive to determining that a labelled route comprises the failed memory component, the failure recovery instructions 223, when executed by processor 220, may remove the label associated with the labelled route and/or may revise the label to indicate that the label comprises the failed memory component.
Responsive to determining that the labelled route comprises the failed memory component, the failure recovery instructions 223, when executed by processor 220, may also determine whether other routes are stored to the destination memory component of the labelled route that comprises the failed memory component. The failure recovery instructions 223, when executed by processor 220, may determine which one of the other routes stored in the storage medium 120 has the best cost metric and may label that route and store the labelled route in the storage medium 120. In some examples, the failure recovery instructions 223, when executed by processor 220, may determine the other routes from labelled routes that comprise information indicating that the labelled route is an alternative route. In some examples, the failure recovery instructions 223, when executed by processor 220, may obtain information about the other routes from the central storage medium of the memory fabric 20.
Responsive to determining that the failure involves a particular link between two memory components, the route labelling instructions 222, when executed by processor 210, may determine which routes comprise the failed link and may select an alternative route in a manner similar to that described with a failed memory component.
Like the memory components of
As with processor 110 of
As detailed below, system 300 may include a series of engines 320, 330, 340 for managing a resilient memory fabric. Each of the engines may generally represent any combination of hardware and programming. For example, the programming for the engines may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the engines may include at least one processor of the system 300 to execute those instructions. In addition or as an alternative, each engine may include one or more hardware devices including electronic circuitry for implementing the functionality described below.
Data routing engine 320 may route data from the memory component 300 to a destination memory component by selecting a labelled route via which to route the data. In some examples, the data routing engine 320 may facilitate routing data in a manner the same as or similar to that of the data routing instructions 121 of memory fabric 10, data routing instructions 221 of memory fabric, 20, and/or other instructions. Further details regarding an example implementation of data routing engine 320 are provided above in connection with data routing instructions 121 of
Route labelling engine 330 may determine and label routes to destination memory components. In some examples, the route labelling engine 330 may determine and label routes to destination memory components in a manner the same as or similar to that of the route labelling instructions 222 of memory fabric, 20, and/or other instructions. Further details regarding an example implementation of route labelling engine 330 are provided above in connection with route labelling instructions 222 of
Failure recovery engine 340 may receive information about a failure in the network of memory components of the memory fabric and may mitigate the failure by selecting an alternative labelled route to a destination memory component. In some examples, the failure recovery engine 340 may mitigate failure in the memory fabric in a manner the same as or similar to that of the failure recovery instructions 223 of memory fabric, 20, and/or other instructions. Further details regarding an example implementation of failure recovery engine 340 are provided above in connection with failure recovery instructions 223 of
Although execution of the methods described below are with reference to memory fabric 10 of
In an operation 400, information related to a memory fabric may be stored at a central non-transitory machine readable storage medium of a memory fabric, where the memory fabric may comprise a network of memory components and each memory component may comprise a respective address space, such that the memory fabric comprises the aggregated respective memory as a single addressable memory space. For example, the memory fabric 10 (and/or the memory fabric 20, memory fabric 30, or other resource of the memory fabric) may store the information related to the aggregated respective memory. The memory fabric 10 may store the information in a manner similar or the same as that described above in relation to the execution of the memory fabric 10, the memory fabric 20, the memory fabric 30, and/or other resource of the memory fabric.
In an operation 410, information related to a set of labelled routes to other memory components in the memory fabric may be stored at a local non-transitory machine readable storage medium of a memory component of the memory fabric. For example, the memory fabric 10 (and/or the memory fabric 20, memory fabric 30, or other resource of the memory fabric) may store the set of labelled routes. The memory fabric 10 may store the set of labelled routes in a manner similar or the same as that described above in relation to the execution of the memory fabric 10, the memory fabric 20, the memory fabric 30, and/or other resource of the memory fabric.
In an operation 420, data may be routed by the first memory component along a selected labelled route. For example, the memory fabric 10 (and/or the data routing instructions 121, the data routing instructions 221, the data routing engine 3200, or other resource of the memory fabric 10) may route the data along the selected labelled route. The memory fabric 10 may route the data along the selected labelled route in a manner similar or the same as that described above in relation to the execution of the data routing instructions 121, the data routing instructions 221, the data routing engine 320, or other resource of the memory fabric 10.
In an operation 500, a memory component may determine a set of routes to other memory components in the memory fabric. For example, the memory fabric 20 (and/or the route labelling instructions 222, the route labelling engine 222, or other resource of the memory fabric 20) may determine the set of routes. The memory fabric 20 may determine the set of routes in a manner similar or the same as that described above in relation to the execution of the route labelling instructions 222, the route labelling engine 222, and/or other resource of the memory fabric 20.
In an operation 510, the memory component may label the determined set of routes. For example, the memory fabric 20 (and/or route labelling instructions 222, the route labelling engine 222, or other resource of the system 300) may label the determined set of routes. The memory fabric 20 may label the determined set of routes in a manner similar or the same as that described above in relation to the execution of the route labelling instructions 222, the route labelling engine 222, or other resource of the memory fabric 20.
In an operation 520, the memory component may store the labelled set of routes in a local non-transitory machine readable storage medium. For example, the memory fabric 20 (and/or the route labelling instructions 222, the route labelling engine 222, or other resource of the memory fabric 20) may store the set of labelled routes. The memory fabric 20 may store the set of labelled routes in a manner similar or the same as that described above in relation to the execution of the route labelling instructions 222, the route labelling engine 222, and/or other resource of the memory fabric 20.
In an operation 530, the memory component may forward information about the stored set of labelled routes to each neighbour memory component. For example, the memory fabric 20 (and/or the route labelling instructions 222, the route labelling engine 222, or other resource of the memory fabric 20) may forward the set of labelled routes. The memory fabric 20 may forward the set of labelled routes in a manner similar or the same as that described above in relation to the execution of the route labelling instructions 222, the route labelling engine 222, and/or other resource of the memory fabric 20.
In an operation 540, the memory component may receive information about its neighbors' stored sets of labelled routes. For example, the memory fabric 20 (and/or the route labelling instructions 222, the route labelling engine 222, or other resource of the memory fabric 20) may receive information about its neighbor's stored sets of labelled routes. The memory fabric 20 may receive information about its neighbor's stored sets of labelled routes in a manner similar or the same as that described above in relation to the execution of the route labelling instructions 222, the route labelling engine 222, and/or other resource of the memory fabric 20.
In an operation 550, the memory component may revise its stored set of labelled routes based on the received information. For example, the memory fabric 20 (and/or the route labelling instructions 222, the route labelling engine 222, or other resource of the memory fabric 20) may revise the set of labelled routes. The memory fabric 20 may revise the set of labelled routes in a manner similar or the same as that described above in relation to the execution of the route labelling instructions 222, the route labelling engine 222, and/or other resource of the memory fabric 20.
In an operation 600, the memory component may receive information about a failure in the memory fabric. For example, the memory fabric 20 (and/or the failure recovery instructions 223, the failure recovery engine 323, or other resource of the memory fabric 20) may receive information about a failure in the memory fabric. The memory fabric 20 may receive information about a failure in the memory fabric in a manner similar or the same as that described above in relation to the execution of the failure recovery instructions 223, the failure recovery engine 323, and/or other resource of the memory fabric 20.
In an operation 610, the memory component may determine that the labelled route to a destination memory component comprises a memory component involved in the failure. For example, the memory fabric 20 (and/or the failure recovery instructions 223, the failure recovery engine 323, or other resource of the memory fabric 20) may determine that the labelled route to a destination memory component comprises a memory component involved in the failure. The memory fabric 20 may determine that the labelled route to a destination memory component comprises a memory component involved in the failure in a manner similar or the same as that described above in relation to the execution of the failure recovery instructions 223, the failure recovery engine 323, and/or other resource of the memory fabric 20.
In an operation 620, the memory component may select an alternative labelled route to the destination memory component. For example, the memory fabric 20 (and/or the failure recovery instructions 223, the failure recovery engine 323, or other resource of the memory fabric 20) may select an alternative labelled route to the destination memory component. The memory fabric 20 may select an alternative labelled route to the destination memory component in a manner similar or the same as that described above in relation to the execution of the failure recovery instructions 223, the failure recovery engine 323, and/or other resource of the memory fabric 20.
The foregoing disclosure describes a number of example embodiments for a resilient memory fabric. The disclosed examples may include systems, devices, computer-readable storage media, and methods for management of a resilient memory fabric. For purposes of explanation, certain examples are described with reference to the components illustrated in
Further, the sequence of operations described in connection with
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2015/072043 | 9/24/2015 | WO | 00 |