This application is related to U.S. application Ser. No. 16/548,116 titled “DISTRIBUTED CACHE WITH IN-NETWORK PREFETCH”, filed on Aug. 22, 2019, which is hereby incorporated by reference in its entirety. This application is also related to U.S. application Ser. No. 16/697,019 titled “FAULT TOLERANT DATA COHERENCE IN LARGE-SCALE DISTRIBUTED CACHE SYSTEMS”, filed on Nov. 26, 2019, which is hereby incorporated by reference in its entirety. This application is also related to U.S. application Ser. No. 16/914,206 titled “DEVICES AND METHODS FOR MANAGING NETWORK TRAFFIC FOR A DISTRIBUTED CACHE”, filed on Jun. 26, 2020, which is hereby incorporated by reference in its entirety.
Although more recent, high-performance networking may enable distributed caching systems in data centers, challenges remain to provide a fault-tolerant and coherent system for large-scale distributed caches. Replication is often used in distributed systems to provide fault tolerance for hardware failures. However, when using cache directory replicas for fault tolerance, synchronizing replicas can prove very difficult, especially for the growing size of today's data centers. In addition, the complexity of a coherency protocol can affect system performance. Since cache coherence operations typically depend on the cache directory, enabling a relatively fast consensus between cache directory replicas is important for such fault tolerance techniques in distributed caches to provide for quick recovery.
Accordingly, there is a need for a fault-tolerant and coherent system for large-scale distributed caches. In this regard, there is a need for systems that support fault-tolerant and consistent directory-based cache coherence over fabrics, such as Ethernet, for distributed caches, without significantly compromising system performance.
The features and advantages of the embodiments of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings. The drawings and the associated descriptions are provided to illustrate embodiments of the disclosure and not to limit the scope of what is claimed.
In the following detailed description, numerous specific details are set forth to provide a full understanding of the present disclosure. It will be apparent, however, to one of ordinary skill in the art that the various embodiments disclosed may be practiced without some of these specific details. In other instances, well-known structures and techniques have not been shown in detail to avoid unnecessarily obscuring the various embodiments.
Network 112 can include, for example, a Storage Area Network (SAN), a Local Area Network (LAN), and/or a Wide Area Network (WAN), such as the Internet. In this regard, one or more of client devices 114, controller 102, and/or one or more of server racks 101 may not be physically co-located. Server racks 101, controller 102, and client devices 114 may communicate using one or more standards such as, for example, Ethernet, Fibre Channel, and/or InfiniBand.
As shown in the example of
Controller 102 communicates with each of the programmable switches 104 in system 100. In some implementations, controller 102 can include a Software Defined Networking (SDN) controller. As discussed in more detail below, controller 102 maintains global cache directory 20 for coherence in the permissions and states of cache lines stored in the distributed cache based on directory updates received from programmable switches 104. In addition, and as discussed in more detail in related application Ser. No. 16/914,206 incorporated by reference above, controller 102 in some implementations can manage network traffic for system 100 with the use of programmable switches 104 based on information provided to controller 102 from programmable switches 104.
Those of ordinary skill in the art will appreciate with reference to the present disclosure that other implementations may include a different number or arrangement of memory devices 110, programmable switches 104, or server racks 101 than shown in the example of
Programmable switches 104 are configured to route cache messages, such as cache line requests, and other communications between client devices 114 and memory devices 110. For example, such cache messages may include a get request or a put request for one or more cache lines, or a permission level request for a client device 114 to modify a cache line requested from a memory device 110. As discussed in more detail below with reference to
In some implementations, programmable switches 104 can include, for example, a switch that can be programmed to handle different custom protocols. As discussed in more detail below with reference to
Data planes 106 of programmable switches 104 in the example of
Data planes 106 of programmable switches 104 are programmable and separate from higher-level control planes 108 that determine end-to-end routes for packets between devices in system 100. In this regard, control planes 108 may be used for handling different processes, such as the processes in
In one example, programmable switches 104 can be 64 port ToR P4 programmable switches, such as a Barefoot Networks Tofino Application Specific Integrated Circuit (ASIC) with ports configured to provide 40 Gigabit Ethernet (GE) frame rates. Other types of programmable switches that can be used as a programmable switch 104 can include, for example, a Cavium Xpliant programmable switch or a Broadcom Trident 3 programmable switch.
The use of a programmable switch allows for the configuration of high-performance and scalable memory centric architectures by defining customized packet formats and processing behavior, such as those discussed below with reference to
Controller 102 using global cache directory 20 can provide coherency among the cache directories 12 stored at the programmable switches 104 in system 100. In some implementations, controller 102 can send indications of updates to backup programmable switches 104 to update replica or backup directories based on an indication of an update to a primary directory of a primary programmable switch 104 that is received by controller 102. In the example of
As discussed in more detail below with reference to the sequence diagram of
In addition, controller 102 may also proactively detect the failure or unavailability of primary programmable switches 1041 (e.g., primary programmable switches 1041A, 1041B, 1041C) and the associated unavailability of their respective primary cache directories 121 (e.g., primary cache directories 121A, 121B, 121C) by sending heartbeat packets to the primary programmable switches 1041. Controller 102 can then set a backup programmable switch to become a new primary programmable switch to provide for a quicker recovery, as compared to conventional distributed caches. For example, if a response to a heartbeat packet is not received from primary programmable switch 1041A, controller 102 may set backup programmable switch 1042A as the new primary programmable switch for rack 101A.
Programmable switches 104 can use timeout values when sending indications of cache directory updates to controller 102. If an acknowledgment of the cache directory update is not received by the programmable switch 104 within the timeout value, the programmable switch 104 resends the indication of the cache directory update to controller 102 to ensure that global cache directory 20 is updated. In some implementations, the primary programmable switches 1041 can use mirroring of its cache directory updates to controller 102 in the background to reduce software overhead that may otherwise be needed in updating a global cache directory. In addition, the processing resources of programmable switches 104, such as the use of Content Addressable Memory (CAM) or Ternary CAM (TCAM) tables, or other types of match-action tables, can ordinarily provide faster processing of such cache directory updates than can occur at the end points of a client device 114 or a memory device 110.
In this regard, each programmable switch 104 can provide centralized data coherency management for the data stored in the memory devices 110 of its respective server rack 101. As discussed in more detail below, each programmable switch 104 can efficiently update a local cache directory 12 for memory devices 110 that it communicates with as cache line requests are received by the programmable switch 104. The limitation of cache directory 12 to the memory devices 110 that communicate with the programmable switch 104 can also improve the scalability of the distributed cache or the ability to expand the size of the distributed cache to new memory devices, such as by adding a new server rack with its own programmable switches and memory devices.
In some implementations, programmable switches 104 may further improve scalability by temporarily assigning logical identifiers to respective active client devices 114 that have requested cache lines, and removing the logical identifiers after the client devices become inactive. By only keeping track of the states or permission levels of active client devices 114 (e.g., the client devices 114 that retain a permission level for one or more cache lines), it is ordinarily possible to reduce the amount of memory needed at programmable switches 104 to store cache directories 12.
In the example of
As will be appreciated by those of ordinary skill in the art with reference to the present disclosure, system 100 may include additional devices or a different number of devices than shown in the example of
Processor 1161 can execute instructions, such as instructions from distributed cache module 161, and application(s) 181, which may include an Operating System (OS) and/or other applications used by client device 1141. Processor 1161 can include circuitry such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a microcontroller, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), hard-wired logic, analog circuitry and/or a combination thereof. In some implementations, processor 1161 can include a System on a Chip (SoC), which may be combined with one or both of memory 1181 and interface 1221. Processor 1161 can include one or more cache levels (e.g., L1, L2, and/or L3 caches) where data is loaded from or flushed into memory 1181, or loaded from or flushed into memory devices 110, such as memory device 1101 in
Memory 1181 can include, for example, a volatile RAM such as SRAM, DRAM, a non-volatile RAM, or other solid-state memory that is used by processor 1161 as an internal main memory to store data. Data stored in memory 1181 can include data read from storage device 1201, data to be stored in storage device 1201, instructions loaded from distributed cache module 161 or application(s) 181 for execution by processor 1161, and/or data used in executing such applications. In addition to loading data from internal main memory 1181, processor 1161 also loads data from memory devices 110 as an external main memory or distributed cache. Such data may also be flushed after modification by processor 1161 or evicted without modification back into internal main memory 1181 or an external main memory device 110 via programmable switch 1041A or programmable switch 1042A.
As shown in
Storage device 1201 serves as secondary storage that can include, for example, one or more rotating magnetic disks or non-volatile solid-state memory, such as flash memory. While the description herein refers to solid-state memory generally, it is understood that solid-state memory may comprise one or more of various types of memory devices such as flash integrated circuits, NAND memory (e.g., single-level cell (SLC) memory, multi-level cell (MLC) memory (i.e., two or more levels), or any combination thereof), NOR memory, EEPROM, other discrete Non-Volatile Memory (NVM) chips, or any combination thereof. As noted above internal main memory 1181 and external memory devices 110 typically provide faster data access and can provide more granular data access (e.g., cache line size or byte-addressable) than storage device 1201.
Interface 1221 is configured to interface client device 1141 with devices in system 100, such as programmable switches 104A and 104B. Interface 1221 may communicate using a standard such as, for example, Ethernet, Fibre Channel, or InfiniBand. In this regard, client device 1141, programmable switches 104A and 104B, controller 102, and memory device 1101 may not be physically co-located and may communicate over a network such as a LAN or a WAN. As will be appreciated by those of ordinary skill in the art, interface 1221 can be included as part of processor 1161.
Programmable switches 1041A and 1042A in some implementations can be ToR switches for server rack 101A including memory device 1101. In the example of
Memory 134 of a programmable switch 104 can include, for example, a volatile RAM such as DRAM, or a non-volatile RAM or other solid-state memory such as register arrays that are used by circuitry 132 to execute instructions loaded from switch cache module 26 or firmware of the programmable switch 104, and/or data used in executing such instructions, such as primary cache directory 121A of programmable switch 1041A or backup cache directory 122A of programmable switch 1042A As discussed in more detail below, switch cache module 26 can include instructions for implementing processes such as those discussed with reference to
In the example of
The other programmable switch or switches 104 for the rack 101 (e.g., programmable switches 1042A and 1043A in
As discussed in more detail below, controller 102 can ensure the ongoing consistency or coherence of the different cache directories 12 of the programmable switches 104 for the rack 101 so that the replacement of a primary programmable switch with a backup programmable switch is seamless without having to update the cache directory of the backup programmable switch before making the transition to a new primary programmable switch. By ensuring the ongoing consistency or coherency of the cache directories 12 for the programmable switches 104, it is ordinarily possible to provide for a quicker recovery after the failure or unavailability of the primary programmable switch since the backup directory 12 is already up to date.
In the example of
Controller 102 in the example of
Such a process can help controller 102 to verify that a backup programmable switch 104 is using a most recent version of a cache directory 12 as part of selecting a new primary programmable switch 104 and/or for ensuring cache directory coherence or consistency among the programmable switches 104 for a server rack 101. In addition, controller 102 can provide additional fault tolerance or redundancy with global cache directory 20 for cases where backup programmable switches 104 for a server rack 101 may not be available due to an error or loss of power, or may have differing information for their cache directories 12.
Processor 124 of controller 102 executes cache controller module 22 to maintain global cache directory 20 and update local cache directories 12 at programmable switches 104, as needed. In addition, processor 124 may also execute cache controller module 22 to send heartbeat packets to primary programmable switches 1041, and to set a backup programmable switch 1042 or 1043 to become a new primary programmable switch in response to a timeout value expiring for a response from a primary programmable switch 1041.
Processor 124 can include circuitry such as a CPU, a GPU, a microcontroller, a DSP, an ASIC, an FPGA, hard-wired logic, analog circuitry and/or a combination thereof. In some implementations, processor 124 can include an SoC, which may be combined with one or both of memory 126 and interface 128. Memory 126 can include, for example, a volatile RAM such as DRAM, a non-volatile RAM, or other solid-state memory that is used by processor 124 to store data. Controller 102 communicates with programmable switches 104 via interface 128, which is configured to interface with ports of programmable switches 104, and may interface according to a standard, such as Ethernet, Fibre Channel, or InfiniBand.
As will be appreciated by those of ordinary skill in the art with reference to the present disclosure, other implementations may include a different arrangement or number of components, or modules than shown in the example of
In the example of
As noted above, cache messages can have a custom packet format so that programmable switch 1041A can distinguish cache messages, such as messages for cache line addressed data, from other network traffic, such as messages for page addressed data. The indication of a cache message, such as a cache line request to put or get cache line data, causes circuitry 1321A of programmable switch 1041A to handle the packet differently from other packets that are not indicated as being a cache message. In some implementations, the custom packet format fits into a standard 802.3 Layer 1 frame format, which can allow the packets to operate with existing and forthcoming programmable switches, such as a Barefoot Tofino ASIC switch, for example. In such an implementation, the preamble, start frame delimiter, and interpacket gap may follow the standard 802.3 Layer 1 frame format, but portions in Layer 2 are replaced with custom header fields that can be parsed by programmable switch 1041A. A payload of a packet for a cache message can include one or more memory addresses for one or more cache lines being requested by a client device or being returned to a client device, and may include data for the cache line or lines.
Stages 362 and 363 can include, for example programmable Arithmetic Logic Units (ALUs) and one or more memories that store match-action tables for matching extracted values from the headers and performing different corresponding actions based on the values, such as performing particular updates to cache directory 121A stored in memory 1341A of programmable switch 1041A. In some implementations, stages 362 and 363 may use CAM or TCAM to quickly identify ports 1301A associated with a destination address extracted from the packet by parser 361. In some implementations, the stages of the ingress pipeline and the egress pipeline may share a single memory, such as memory 1341A in
Traffic manager 38 routes the cache message (e.g., a cache line request) to an appropriate port of programmable switch 1041A. As discussed in more detail in co-pending application Ser. No. 16/548,116 incorporated by reference above, the ingress pipeline in some implementations may calculate offsets for additional cache lines to be prefetched based on the parsed header fields, and then generate corresponding additional cache line request packets using a packet generation engine of programmable switch 1041A.
In the example of
As will be appreciated by those of ordinary skill in the art with reference to the present disclosure, other implementations may include a different arrangement of modules for a programmable switch. For example, other implementations may include more or less stages as part of the ingress or egress pipeline.
If the incoming message is a cache message, such as a get or a put cache line request to retrieve or store a cache line, respectively, ingress pipeline 36 can determine whether the cache message is a read request, a write request, or other type of cache message, such as a cache coherency message. As discussed in the example header format of
If the incoming message is not a cache message, such as a read or write command in units greater than a cache line size (e.g., in a page or block size), the message or portions of the message, such as a header and a payload, are passed to traffic manager 38, which can determine a port 130 for sending the message. In some implementations, a destination address in the header can indicate a port 130 to send the message via egress pipeline 40, which may reassemble the message before sending the message to another device in system 100.
In the case where the incoming message is a cache line request, match-action tables of one or more of stages 362 and 363 may be used to determine a memory device 110 storing the requested cache line or cache lines. In this regard, and as discussed in more detail in co-pending application Ser. No. 16/697,019 incorporated by reference above, the memory device 110 may serve as a home node or serialization point for the cache lines it stores by allowing access and granting permission levels for modification of the cache lines to other nodes or devices in system 100. Traffic manager 38 can determine a port 130 for sending the cache line request to the identified memory device 110 storing the requested cache line.
In the cases of a read miss or a write miss, egress pipeline 40 including deparser 403 reassembles or builds one or more packets for the cache line request and sends it to the identified memory device 110. Ingress pipeline 36 may determine that a requested cache line or a cache line to be written is not currently represented in the cache directory 12 stored at programmable switch 104. In such cases, circuitry 132 of programmable switch 140 can update its cache directory 12 after receiving the requested cache line from a memory device 110 or after receiving a confirmation from a memory device 110 that the cache line has been written. Programmable switch 104 then sends an indication of the cache directory update to controller 102 to update global cache directory 20 to account for the addition of the new cache line in cache directory 12.
In the cases of a read hit or a write hit, one or more of egress stages 401 and 402 may be used to update cache directory 12. In some examples, a status or permission level, and/or a version number may be changed in cache directory 12 for an entry corresponding to the requested cache line. The read request may be reassembled or built by deparser 403, and sent to the identified memory device 110 storing the requested data.
As discussed in more detail below with reference to the sequence diagram of
In the case of a write request, egress pipeline 40 may use one or more of egress stages 401 and 402 to identify other nodes or devices in system 100 storing a copy of the requested cache line or lines and a status or permission level for the requested data. In such examples, egress pipeline 40 may also send cache line requests to the other nodes or devices to change a status or permission level of such other nodes. For example, a request to modify a cache line that is being shared by multiple nodes in addition to the memory device 110 storing the cache line can result in egress pipeline 40 sending cache line requests to the other nodes to change their permission level from shared to invalid for the cache line requested from memory device 110.
As will be appreciated by those of ordinary skill in the art with reference to the present disclosure, other arrangements of operations performed by programmable switch 104 are possible than those shown in the example of
As shown in
With reference to the example of
Each programmable switch 104 may keep a mapping table, such as a TCAM table, to translate the physical client device identifier to a logical identifier and vice-versa. This table can be reconfigured and updated during runtime to add new shares or to remove information about inactive client devices 114 according to the actively shared cache lines. This framework leverages the programmability of the switch pipelines to serve the cache coherence requests based on the coherence states.
The logical identifiers may be calculated or determined in various ways. In one example implementation, programmable switch 104 can perform a hashing on a number of identifiers for the client device 114, such as its MAC address, IP address, and port number, and then determine a logical identifier with a lower number of bits based on the hashing.
In another example implementation, programmable switches 104 and client devices 114 may be assigned location-based identifiers during a topology discovery process by exchanging, for example, Link Layer Discovery Protocol (LLDP) messages. The logical identifiers may then be formatted as an encoded location with respect to a hierarchical level of the programmable switch 104 or client device 114 in a multi-rooted tree topology of the data center's network. For example, individual bytes in a logical identifier from left to right could indicate a core switch identity, a domain identifier (e.g., a port number of the core switch through which a programmable switch 104 is connected), a host identifier (e.g., a port number of the programmable switch port through which the client device 114 is connected), and a local client device identifier. A logical identifier may be represented as, for example, 2.3.8.11. As the LLDP messages move from core switches towards the client device 114 during the topology discovery process, they will carry the information regarding the traveled nodes in order to enable nodes to setup their flow entries in TCAM tables, for example.
In the example of
In addition, programmable switch 1041A may limit the coherence domain or limit the tracking of permission levels or states of only the active client devices 114 that may be located in server rack 101A. In such implementations, programmable switch 1041A may determine and assign logical identifiers for active client devices that have received cache lines and retain a permission level with respect to the cache lines.
When the active client devices 114 have released or relinquished all of their permission levels for cache lines, as discussed in more detail below with the examples of
In some cases, an address or other indicator of the memory device 110 storing the cache line may be included as part of the address for the cache line. As shown in the example of
In this regard, different devices in a system implementing a distributed cache may not be exactly synchronized with each other. In some implementations, this challenge is overcome by using the time provided by the home memory device 110 that stores the requested data. Programmable switch 1041A may receive this time in a cache message from memory device 110 with the requested data. The use of the home memory device 110 that stores the requested data as the serialization point or timekeeper for the requested data can provide a consistent timestamp for the requested data and allow for scalability of the distributed cache without having to synchronize timekeeping among an increasing number of devices at a central location. In other implementations, the timestamp may instead be determined by programmable switch 1041A.
The latest timestamp of a cache directory 12 may be used as a timestamp representing the current version of the cache directory 12. In other implementations, a separate field may be used in cache directory 12 for a version number or timestamp representing the state of the cache directory as a whole. As discussed in more detail below with reference to the sequence diagram of
In the example of cache directory 121A in
The cache line indicated by address C in cache directory 121A is stored in memory device 1102A, and has shared read-only copies of the cache line stored at the client devices 114 assigned logical identifiers q and w. The cache line has been modified twice since it was originally stored in memory device 1102A, and was last modified or authorized to be modified by its home memory device 1102A at the time indicated by the corresponding timestamp in cache directory 121A.
As shown in
As will be appreciated by those of ordinary skill in the art in light of the present disclosure, cache directory 121A may include different information than shown in
In
For its part, memory device 1101A receives the cache line request from client device 1141 and either maintains a shared permission level (i.e., S in memory device 1101) with respect to the requested data or changes its permission level with respect to the requested data from exclusive to shared (i.e., E to S in
In the bottom half of
As noted above, the present disclosure uses programmable switch 104 to maintain the cache directory 12 for its respective memory devices 110. This ordinarily provides an efficient way to maintain cache directories 12 for the distributed cache, since programmable switch 104 serves as an intermediary or centralized location for communication between client devices 114 and its memory devices 110. Programmable switch 104 can update its cache directory 12 based on the cache line requests it receives for memory devices 110 without having to coordinate among a larger number of caches located at a greater number of client devices 114 or memory devices 110. Using programmable switch 104 to update a local cache directory also improves scalability of the distributed cache, since, in certain implementations, each programmable switch is responsible for only the cache lines stored in its associated set of memory devices 110.
In addition, controller 102 serves as a centralized location for initiating the update of backup cache directories 12 stored at backup programmable switches for racks 101. This ordinarily improves consistency among global cache directory 20 and the backup cache directories 12 in case a primary programmable switch 1041 fails or otherwise becomes unavailable.
The top right example state diagram of
The bottom example state diagram in
Memory device 1101A then sends the requested data to client device 1141 and grants permission to client device 1141 to modify the data. The status of memory device 1101A with respect to the requested data changes from shared to invalid, while the status of client device 1141 with respect to the requested data changes from either invalid to exclusive or shared to exclusive, depending on whether client device 1141 was previously sharing the data with client devices 1142 and 1143. In cases where client device 1141 already was sharing the requested data, memory device 1101A may only send an indication that the permission level of client device 1141 can be changed from shared to exclusive, since client device 1141 already has a copy of the requested data.
In the example state diagram on the right side of
As discussed above, memory device 110 in the foregoing examples serves as a serialization point for the modification of the data it stores. In other words, the order of performing requests for the same data is typically in the order that memory device 110 receives requests for the data. In addition, memory device 110 uses a non-blocking approach where cache line requests are granted in the order in which they are received. In some implementations, programmable switch 104 may delay additional requests received for data that is in progress of being modified and/or may send a request for a modified copy of the cache line to the client device 114 that has modified the data without having to wait for a request from memory device 110 to retrieve the modified data from the client device 114.
The payload of the example frame shown in
In some implementations, cache request information 64 may not be present in every cache message packet received by programmable switch. For example, client devices 114 and/or memory devices 110 may only send cache request information 64 at a particular interval, or when a particular condition is reached, such as when a queue of the client device 114 or memory device 110 reaches a threshold.
The Ethernet packet format in the example of
In addition to the PCP field, the example of
As discussed above, the priority indicator can be used by programmable switch 104 to determine a queue for the cache message among a plurality of queues for transmission via a particular port of programmable switch 104. In the example of
Additionally or alternatively, priority indicator 60 can be used to indicate different types of client devices 114. For example, different types of client devices 114 such as FPGAs, CPUs, GPUs, cores, or ASICs may be assigned a value for all of its priority indicators 60 or a range of values depending on the types of applications executed by the client device 114. The use of priority indicators across system 100 for the distributed cache can ordinarily allow for a more diverse or heterogenous use of different client devices 114, and a wider variety of applications that may have different demands on the distributed cache in terms of reliability, the rate of cache messages, and the size of message flows.
The OpCode field can indicate an operation type for an intended operation to be performed using a requested cache line or cache lines, such as an acquire to read or an acquire to read and write. In other cases, the OpCode field can indicate whether the packet is a probe to change the permission level of a client device 114 with respect to a cache line, or a probe acknowledgment to indicate that a permission level has been changed. In this regard, the parameter field of custom header 62 can indicate a current or requested permission level from the device sending the packet.
The size field of header 62 can indicate the size of the data requested (e.g., a number of cache lines or a size in bytes) or the size of the data provided in payload 32. The domain field in
As will be appreciated by those of ordinary skill in the art in light of the present disclosure, other message or packet formats can be used with programmable switches 104 for cache messages. For example, other implementations may include the priority indicator in the payload, as opposed to a separate 802.1Q tag, or may not include a priority indicator at all. Similarly, other implementations may not include cache request information 64.
Programmable switch 1041 then sends the cache line request to memory device 110, which stores the cache line. Programmable switch 1041 also sets a timeout value for resending the cache line request. If the timeout value expires before receiving the requested cache line from memory device 110, programmable switch 1041 resends the cache line request to memory device 110. This can provide quick error detection and recovery to handle packet losses due to link failures and provide for more reliability. In some implementations, programmable switch 1041 may use a timeout register for receiving the requested cache line or an acknowledgment of the cache line request by memory device 110. The timeout value can, for example, be based on a typical roundtrip packet duration between programmable switch 1041 and memory device 110, and an expected processing time. The resending of a cache line request or other type of cache line message may be repeated in some implementations until an acknowledgment or the requested cache line is received, or for a predetermined number of attempts.
In the example of
In some implementations, programmable switch 1041 may use a timeout register for receiving the acknowledgment of the cache directory update from controller 102. The timeout value can, for example, be based on a typical or expected roundtrip packet duration between programmable switch 1041 and controller 102, and an expected processing time. The resending of the indication of the cache directory update may be repeated in some implementations until an acknowledgment is received or for a predetermined number of attempts. In some implementations, programmable switch 1041 may send the indication of the cache directory update to one or more backup programmable switches 104 in response to not receiving an acknowledgment from controller 102. In other implementations, programmable switch 104 may send the indication of the cache directory update to the one or more backup programmable switches 104 when sending the indication of the cache directory update to the controller.
Controller 102 updates global cache directory 20 based on the received indication of the cache directory update from programmable switch 104. As discussed in more detail below with reference to the sequence diagram of
In
In block 802, programmable switch 104 receives a cache line request to obtain one or more cache lines stored in the distributed cache. The cache line request may come from a client device 114 to perform an operation using the one or more requested cache lines, such as a read operation or a write operation.
In block 804, programmable switch 104 updates cache directory 12 based on the received cache line request. As discussed above with reference to the example of
In block 806, programmable switch 104 sends the cache line request to a memory device 110 corresponding to an address indicated by the cache line request. As discussed in more detail in co-pending application Ser. No. 16/914,206 incorporated by reference above, traffic manager 38 of programmable switch 104 may identify a particular queue for queuing the cache line request based on a size of a message flow including the cache line request and/or based on a priority indicator for the cache line request. In addition, programmable switch 104 may also send additional cache line requests to prefetch additional cache lines predicted to be needed based on the received cache line request and additional prefetch information, as discussed in co-pending application Ser. No. 16/548,116.
In block 808, programmable switch 104 sends an indication of the cache directory update made in block 904 to controller 102 to update global cache directory 20. In some implementations, one or more egress stages of the egress pipeline of programmable switch 104 may mirror the cache directory update made to cache directory 12 as a cache message that is sent to controller 102. The indication of the update can include a timestamp associated with the update made to cache directory 12
Those of ordinary skill in the art will appreciate with reference to the present disclosure that the order of blocks for the cache directory update process of
In block 902, programmable switch 104 assigns logical identifiers to respective active client devices 114 that have requested one or more cache lines. As discussed above, the logical identifier may be determined by programmable switch 104 by, for example, performing a hashing on a number of identifiers for the client device 114, such as its MAC address, IP address, and/or port number. In other implementations, programmable switch 104 may determine the logical identifier for the client device 114 using an encoded location with respect to a hierarchical level of the client device 114 in a multi-rooted tree topology of the data center's network.
As discussed above with reference to the example of cache directory 121A in
In block 904, programmable switch 104 uses the assigned logical identifier in cache directory 12 to indicate permission levels for active client devices with respect to the cache lines indicated in cache directory 12. When sending an indication of a cache line update to controller 102, as in the example of block 808 in the process of
Programmable switch 104 may remove a logical identifier from cache directory 12 when the corresponding client device 114 releases all of its permission levels for the cache line or lines it previously requested.
Controller 102 receives the indication of the cache directory update from the primary programmable switch 1041, and requests one or more timestamps from one or more respective backup programmable switches 104. The backup programmable switches 104 are backup switches that store backup cache directories 12 for the primary programmable switch 1041 that sent the indication of the cache directory update. Controller 102 may maintain a mapping or list of programmable switches for each rack 101 in system 100, and may use this information to send the timestamp requests to the backup programmable switches 104 for the particular rack served by the primary programmable switch 1041.
The backup programmable switches 104 may select a timestamp indicating a latest or most recent modification to their backup cache directories 12. In some cases, each cache line entry in the backup cache directory 12 may have its own timestamp, and the backup programmable switches 104 may use the most recent timestamp in responding to controller 102. In other cases, the entire backup cache directory 12 may have a timestamp or version number that is used in responding to controller 102.
After controller 102 has received the timestamps from backup programmable switches 104, controller 102 determines a new later timestamp. The later timestamp helps ensure consistency among backup programmable switches 104 such that an earlier timestamp is not used for updating the cache directories at backup programmable switches 104. In other implementations, controller 102 may determine the new timestamp after receiving the current timestamp from a majority of backup programmable switches 104. This can ordinarily improve the speed of the cache directory update process, such as when there are a larger number of backup programmable switches 104. In such implementations, controller 102 may use a greater offset from the latest timestamp as a protective measure against additional backup programmable switches 104 sending a later timestamp after determining the new timestamp.
In the example of
Controller 102 also sends one or more additional indications of the cache directory update to backup programmable switches 104 with an indication of the new determined timestamp. When sending the indication of the cache directory update, controller 102 may also set a timeout value for receiving an acknowledgment back from each backup programmable switch 104. The timeout value may be set, for example, based on a roundtrip time to send messages to backup programmable switches 104 and processing time for backup programmable switches 104 to update their cache directories and send an acknowledgment. Controller 102 may resend the indication of the cache directory update to any backup programmable switches 104 that it does not receive an acknowledgment from before the expiration of the timeout value.
Backup programmable switches 104 update their respective backup cache directories 12 based on the received indication of the cache directory update from controller 102. The update can be made indicating the new timestamp determined by controller 102. As noted above, this new timestamp may be used for the particular cache line or cache lines for the update, or may be used for the backup cache directory 12 as a whole. Upon completing the update, each backup programmable switch 104 sends an acknowledgment of the cache directory update back to controller 102 to confirm that the cache directory update has been made.
Those of ordinary skill in the art will appreciate in light of the present disclosure that other implementations may differ from the example sequence shown in
In block 1102, controller 102 receives an indication of an update made to a cache directory 12 stored at a programmable switch 1041 that is acting as a primary programmable switch for a collection of nodes (e.g., memory devices 110 and/or client devices 114 in a server rack 101). The update can include, for example, an update to add a new cache line to the cache directory, change the permission level for the cache line, change a node (e.g., a client device 114) that has access to the cache line, indicate a modification to the cache line, or change the storage location (e.g., the home node or memory device 110) storing the cache line. In some cases, the update to the cache directory can include a consolidation of entries for multiple cache lines that may have contiguous addresses.
In block 1104, controller 102 updates global cache directory 20 based on the received indication of the update to the cache directory 12. In some implementations, the update to global cache directory 20 may include the timestamp determination discussed above for the sequence diagram of
In block 1106, controller 102 sends at least one additional indication of the update to at least one other programmable switch 104 (e.g., backup programmable switches 104) to update at least one backup cache directory 12. In sending the at least one additional indication of the cache directory update, controller 102 may use a mapping or other data structure associating the primary programmable switch 1041 with its backup programmable switch or switches 104. Controller 102 may also use timeout values for receiving acknowledgments from the additional one or more programmable switches 104 to ensure that the backup cache directories 12 are updated. As discussed above, using a centralized controller, such as controller 102, to maintain consistency among the backup cache directories 12 can ordinarily provide a centralized Paxos-style leader to maintain consensus among the cache directories 12 and global cache directory 20.
Those of ordinary skill in the art will appreciate with reference to the present disclosure that the order of blocks in
In block 1202, controller 102 sends a heartbeat packet to a primary programmable switch 1041, and sets a timeout value for receiving a response to the heartbeat packet. The heartbeat packet may be treated as a cache message or other type of message with an operation code or field in the packet indicating that a response is required from the primary programmable switch 1041.
In block 1204, controller 102 determines whether the timeout value expired before receiving the response from the primary programmable switch 1041. The timeout value may be based on a roundtrip time and processing that may be performed by the primary programmable switch 1041 in responding to the heartbeat packet.
If the timeout value does not expire in block 1204, controller 102 in block 1206 maintains the current primary programmable switch 1041 for the subset of programmable switches including the current primary programmable switch 1041 and its one or more backup programmable switches 104. In some implementations, the response to the heartbeat packet may only indicate that the heartbeat packet was received. In other implementations, the response to the heartbeat packet may include additional information from the primary programmable switch 1041, such as queue occupancy information or another indication of usage or traffic at the primary programmable switch 1041.
On the other hand, if controller 102 does not receive the response to the heartbeat packet from the primary programmable switch 1041 before expiration of the timeout value, controller 102 in block 1208 sets a backup programmable switch 104 in the subset of programmable switches 104 to become the new primary programmable switch 104. In setting the new primary programmable switch, controller 102 may send a specific code in a cache message or other type of message indicating that the backup programmable switch 104 is the new primary programmable switch. For its part, the new primary programmable switch may then send indications to each of the nodes in its rack 101, such as memory devices 110 and/or client devices 114 that messages are now routed through the new primary programmable switch 104. In addition, ports 130 of the new primary programmable switch 104 may be activated for communication on network 112.
The use of controller 102 to proactively identify possible failed or otherwise unavailable programmable switches 104 using heartbeat packets can ordinarily allow for a quick identification and replacement of such unavailable programmable switches. In addition, since the backup cache directories 12 of the backup programmable switches are updated in the background by controller 102, time does not need to be wasted updating the backup programmable switches 104. The backup cache directories 12 are therefore ready for use as soon as a new primary programmable switch is selected to replace a failed or otherwise unavailable programmable switch 104. Those of ordinary skill in the art will appreciate with reference to the present disclosure that variations of the process of
As discussed above, the foregoing use of a centralized controller to maintain a global cache directory and to update backup cache directories stored at backup programmable switches can improve the fault-tolerance and maintain a coherent system for large-scale distributed caches. In addition, the use of timeout values for receiving acknowledgments on cache directory updates can help ensure that cache directory updates are made to the global cache directory, and also to the backup cache directories. The use of logical identifiers for active client devices can also facilitate the storage of cache directories locally at programmable switches that can provide a quick update to the cache directories due to in-line processing and programmable match-action tables.
Those of ordinary skill in the art will appreciate that the various illustrative logical blocks, modules, and processes described in connection with the examples disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. Furthermore, the foregoing processes can be embodied on a computer readable medium which causes processor or controller circuitry to perform or execute certain functions.
To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, and modules have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Those of ordinary skill in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, units, modules, processor circuitry, and controller circuitry described in connection with the examples disclosed herein may be implemented or performed with a general purpose processor, a GPU, a DSP, an ASIC, an FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. Processor or controller circuitry may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, an SoC, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The activities of a method or process described in connection with the examples disclosed herein may be embodied directly in hardware, in a software module executed by processor or controller circuitry, or in a combination of the two. The steps of the method or algorithm may also be performed in an alternate order from those provided in the examples. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable media, an optical media, or any other form of storage medium known in the art. An exemplary storage medium is coupled to processor or controller circuitry such that the processor or controller circuitry can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to processor or controller circuitry. The processor or controller circuitry and the storage medium may reside in an ASIC or an SoC.
The foregoing description of the disclosed example embodiments is provided to enable any person of ordinary skill in the art to make or use the embodiments in the present disclosure. Various modifications to these examples will be readily apparent to those of ordinary skill in the art, and the principles disclosed herein may be applied to other examples without departing from the spirit or scope of the present disclosure. The described embodiments are to be considered in all respects only as illustrative and not restrictive. In addition, the use of language in the form of “at least one of A and B” in the following claims should be understood to mean “only A, only B, or both A and B.”
Number | Name | Date | Kind |
---|---|---|---|
6044438 | Olnowich | Mar 2000 | A |
6078997 | Young et al. | Jun 2000 | A |
6108737 | Sharma et al. | Aug 2000 | A |
6209065 | Van Doren et al. | Mar 2001 | B1 |
6230243 | Elko | May 2001 | B1 |
6263404 | Borkenhagen | Jul 2001 | B1 |
6298418 | Fujiwara et al. | Oct 2001 | B1 |
6343346 | Olnowich | Jan 2002 | B1 |
6775804 | Dawson | Aug 2004 | B1 |
6829683 | Kuskin | Dec 2004 | B1 |
6868439 | Basu et al. | Mar 2005 | B2 |
6954844 | Lentz et al. | Oct 2005 | B2 |
6993630 | Williams et al. | Jan 2006 | B1 |
7032078 | Cypher et al. | Apr 2006 | B2 |
7376799 | Veazey et al. | May 2008 | B2 |
7673090 | Kaushik et al. | Mar 2010 | B2 |
7716425 | Uysal et al. | May 2010 | B1 |
7975025 | Szabo et al. | Jul 2011 | B1 |
8166251 | Luttrell | Apr 2012 | B2 |
8281075 | Arimilli et al. | Oct 2012 | B2 |
9088592 | Craft et al. | Jul 2015 | B1 |
9313604 | Holcombe | Apr 2016 | B1 |
9442850 | Rangarajan | Sep 2016 | B1 |
9467380 | Hong et al. | Oct 2016 | B2 |
9712381 | Emanuel et al. | Jul 2017 | B1 |
9819739 | Hussain et al. | Nov 2017 | B2 |
9825862 | Bosshart | Nov 2017 | B2 |
9826071 | Bosshart | Nov 2017 | B2 |
9880768 | Bosshart | Jan 2018 | B2 |
9910615 | Bosshart | Mar 2018 | B2 |
9912610 | Bosshart et al. | Mar 2018 | B2 |
9923816 | Kim et al. | Mar 2018 | B2 |
9936024 | Malwankar et al. | Apr 2018 | B2 |
9940056 | Bosshart | Apr 2018 | B2 |
10038624 | Cruz et al. | Jul 2018 | B1 |
10044583 | Kim et al. | Aug 2018 | B2 |
10050854 | Licking et al. | Aug 2018 | B1 |
10063407 | Kodeboyina et al. | Aug 2018 | B1 |
10063479 | Kim et al. | Aug 2018 | B2 |
10063638 | Huang | Aug 2018 | B2 |
10067967 | Bosshart | Sep 2018 | B1 |
10075567 | Licking et al. | Sep 2018 | B1 |
10078463 | Bosshart | Sep 2018 | B1 |
10084687 | Sharif et al. | Sep 2018 | B1 |
10110454 | Kim et al. | Oct 2018 | B2 |
10127983 | Peterson et al. | Nov 2018 | B1 |
10133499 | Bosshart | Nov 2018 | B2 |
10146527 | Olarig et al. | Dec 2018 | B2 |
10158573 | Lee et al. | Dec 2018 | B1 |
10164829 | Watson et al. | Dec 2018 | B1 |
10169108 | Gou et al. | Jan 2019 | B2 |
10225381 | Bosshart | Mar 2019 | B1 |
10230810 | Bhide et al. | Mar 2019 | B1 |
10237206 | Agrawal et al. | Mar 2019 | B1 |
10257122 | Li et al. | Apr 2019 | B1 |
10268634 | Bosshart et al. | Apr 2019 | B1 |
10298456 | Chang | May 2019 | B1 |
10496566 | Olarig et al. | Dec 2019 | B2 |
10628353 | Prabhakar et al. | Apr 2020 | B2 |
10635316 | Singh et al. | Apr 2020 | B2 |
10761995 | Blaner et al. | Sep 2020 | B2 |
10812388 | Thubert et al. | Oct 2020 | B2 |
10880204 | Shalev et al. | Dec 2020 | B1 |
20030009637 | Arimilli | Jan 2003 | A1 |
20030028819 | Chiu | Feb 2003 | A1 |
20030158999 | Hauck et al. | Aug 2003 | A1 |
20040044850 | George et al. | Mar 2004 | A1 |
20040073699 | Hong | Apr 2004 | A1 |
20040260883 | Wallin et al. | Dec 2004 | A1 |
20050058149 | Howe | Mar 2005 | A1 |
20060265568 | Burton | Nov 2006 | A1 |
20070067382 | Sun | Mar 2007 | A1 |
20080010409 | Rao et al. | Jan 2008 | A1 |
20090240664 | Dinker | Sep 2009 | A1 |
20090240869 | O'Krafka | Sep 2009 | A1 |
20090313503 | Atluri et al. | Dec 2009 | A1 |
20100008260 | Kim et al. | Jan 2010 | A1 |
20100223322 | Mott | Sep 2010 | A1 |
20110004729 | Akkawi et al. | Jan 2011 | A1 |
20110093925 | Krishnamoorthy et al. | Apr 2011 | A1 |
20110238923 | Hooker et al. | Sep 2011 | A1 |
20120110108 | Li | May 2012 | A1 |
20120155264 | Sharma et al. | Jun 2012 | A1 |
20130254325 | Song et al. | Sep 2013 | A1 |
20130263249 | Song et al. | Oct 2013 | A1 |
20140219284 | Chau et al. | Aug 2014 | A1 |
20140241361 | Bosshart et al. | Aug 2014 | A1 |
20140269413 | Hui et al. | Sep 2014 | A1 |
20140269716 | Pruss et al. | Sep 2014 | A1 |
20140278575 | Anton et al. | Sep 2014 | A1 |
20140331001 | Liu et al. | Nov 2014 | A1 |
20140362709 | Kashyap et al. | Dec 2014 | A1 |
20150195216 | Di Pietro et al. | Jul 2015 | A1 |
20150301949 | Koka | Oct 2015 | A1 |
20150319243 | Hussain et al. | Nov 2015 | A1 |
20150378919 | Anantaraman et al. | Dec 2015 | A1 |
20160050150 | Venkatesan et al. | Feb 2016 | A1 |
20160099872 | Kim et al. | Apr 2016 | A1 |
20160127492 | Malwankar et al. | May 2016 | A1 |
20160156558 | Hong et al. | Jun 2016 | A1 |
20160216913 | Bosshart | Jul 2016 | A1 |
20160246507 | Bosshart | Aug 2016 | A1 |
20160246535 | Bosshart | Aug 2016 | A1 |
20160294451 | Jung et al. | Oct 2016 | A1 |
20160315964 | Shetty et al. | Oct 2016 | A1 |
20160323189 | Ahn et al. | Nov 2016 | A1 |
20170026292 | Smith et al. | Jan 2017 | A1 |
20170054618 | Kim | Feb 2017 | A1 |
20170054619 | Kim | Feb 2017 | A1 |
20170063690 | Bosshart | Mar 2017 | A1 |
20170064047 | Bosshart | Mar 2017 | A1 |
20170093707 | Kim et al. | Mar 2017 | A1 |
20170093986 | Kim et al. | Mar 2017 | A1 |
20170093987 | Kaushalram et al. | Mar 2017 | A1 |
20170187846 | Shalev et al. | Jun 2017 | A1 |
20170214599 | Seo et al. | Jul 2017 | A1 |
20170286363 | Joshua et al. | Oct 2017 | A1 |
20170371790 | Dwiel et al. | Dec 2017 | A1 |
20180034740 | Beliveau et al. | Feb 2018 | A1 |
20180060136 | Herdrich | Mar 2018 | A1 |
20180173448 | Bosshart | Jun 2018 | A1 |
20180176324 | Kumar et al. | Jun 2018 | A1 |
20180234340 | Kim et al. | Aug 2018 | A1 |
20180234355 | Kim et al. | Aug 2018 | A1 |
20180239551 | Bosshart | Aug 2018 | A1 |
20180242191 | Lundqvist et al. | Aug 2018 | A1 |
20180260330 | Felter et al. | Sep 2018 | A1 |
20180262459 | Wang et al. | Sep 2018 | A1 |
20180285275 | Barczak et al. | Oct 2018 | A1 |
20180329818 | Cheng et al. | Nov 2018 | A1 |
20180335953 | Ramaswamy et al. | Nov 2018 | A1 |
20180337860 | Kim et al. | Nov 2018 | A1 |
20180349163 | Gao et al. | Dec 2018 | A1 |
20180349285 | Ish et al. | Dec 2018 | A1 |
20190012278 | Sindhu et al. | Jan 2019 | A1 |
20190044878 | Steffen et al. | Feb 2019 | A1 |
20190050333 | Chacon | Feb 2019 | A1 |
20190058646 | Kim et al. | Feb 2019 | A1 |
20190087341 | Pugsley et al. | Mar 2019 | A1 |
20190196987 | Shen et al. | Jun 2019 | A1 |
20190220429 | Ranjan et al. | Jul 2019 | A1 |
20190227921 | Frolikov | Jul 2019 | A1 |
20190342785 | Li et al. | Nov 2019 | A1 |
20190354402 | Bivens et al. | Nov 2019 | A1 |
20190370176 | Priyadarshi et al. | Dec 2019 | A1 |
20190391928 | Lin | Dec 2019 | A1 |
20190394261 | DeCusatis et al. | Dec 2019 | A1 |
20200007408 | Siddappa | Jan 2020 | A1 |
20200065269 | Balasubramani et al. | Feb 2020 | A1 |
20200068014 | Sarkar et al. | Feb 2020 | A1 |
20200089619 | Hsu et al. | Mar 2020 | A1 |
20200097212 | Lakshman et al. | Mar 2020 | A1 |
20200151104 | Yang | May 2020 | A1 |
20200213156 | Cheng et al. | Jul 2020 | A1 |
20200226068 | Gellerich | Jul 2020 | A1 |
20200250099 | Campbell | Aug 2020 | A1 |
20200293499 | Kohli | Sep 2020 | A1 |
20200313999 | Lee et al. | Oct 2020 | A1 |
20200349080 | Radi et al. | Nov 2020 | A1 |
20200379668 | Akaike et al. | Dec 2020 | A1 |
20210034250 | Mizuno et al. | Feb 2021 | A1 |
20210034270 | Gupta et al. | Feb 2021 | A1 |
20210049078 | Khan et al. | Feb 2021 | A1 |
20210051751 | Pawar | Feb 2021 | A1 |
20210073086 | Subraya et al. | Mar 2021 | A1 |
20210149807 | Gupta | May 2021 | A1 |
20210173589 | Benisty et al. | Jun 2021 | A1 |
20210194828 | He et al. | Jun 2021 | A1 |
20210218623 | Jain et al. | Jul 2021 | A1 |
20210247935 | Beygi et al. | Aug 2021 | A1 |
20210266219 | Kim et al. | Aug 2021 | A1 |
20210294506 | Tadokoro | Sep 2021 | A1 |
20210318828 | Valtonen | Oct 2021 | A1 |
Number | Date | Country |
---|---|---|
102163279 | Oct 2020 | KR |
Entry |
---|
Hashemi et al.; “Learning Memory Access Patters”; 15 pages; Mar. 6, 2018; available at https://arxiv.org/pdf/1803.02329.pdf. |
Kim, et al.; “A Framework for Data Prefetching using Off-line Training of Markovian Predictors”; Sep. 18, 2002; 8 pages; available at https://www.comp.nus.edu.sg/˜wongwf/papers/ICCD2002.pdf. |
Eisley et al.; “In-Network Cache Coherence”; 2006; pp. 321-332; Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. |
Jin et al.; “NetCache: Balancing Key-Value Stores with Fast In-Network Caching”; Oct. 28, 2017; pp. 121-136; Proceedings of the 26th Symposium on Operating Systems Principles. |
Li et al.; “Pegasus: Load-Aware Selective Replication with an In-Network Coherence Directory”; Dec. 2018; 15 pages; Technical Report UW-CSE-18-12-01, University of Washington CSE, Seattle, WA. |
Liu et al.; “IncBricks: Toward In-Network Computation with an In-Network Cache”; Apr. 2017; pp. 795-809; ACM SIGOPS Operating Systems Review 51, Jul. 26, No. 2. |
Pending U.S. Appl. No. 16/697,019, filed Nov. 26, 2019, entitled “Fault Tolerant Data Coherence in Large-Scale Distributed Cache Systems”, Marjan Radi et al. |
Vestin et al.; “FastReact: In-Network Control and Caching for Industrial Control Networks using Programmable Data Planes”; Aug. 21, 2018; pp. 219-226; IEEE 23rd International Conference on Emerging Technologies and Factory Automation (ETFA). vol. 1. |
Pending U.S. Appl. No. 16/548,116, filed Aug. 22, 2019, entitled “Distributed Cache With In-Network Prefetch”, Marjan Radi et al. |
Written Opinion dated Feb. 20, 2020 from International Application No. PCT/US2019/068360, 4 pages. |
Botelho et al.; “On the Design of Practical Fault-Tolerant SDN Controllers”; Sep. 2014; 6 pages; available at: http://www.di.fc.ul.pt/˜bessani/publications/ewsdn14-ftcontroller.pdf. |
Huynh Tu Dang; “Consensus Protocols Exploiting Network Programmability”; Mar. 2019; 154 pages; available at: https://doc.rero.ch/record/324312/files/2019INFO003.pdf. |
Jialin Li; “Co-Designing Distributed Systems with Programmable Network Hardware”; 2019; 205 pages; available at: https://digital.lib.washington.edu/researchworks/bitstream/handle/1773/44770/Li_washington_0250E_20677.pdf?sequence=1 &isAllowed=y. |
Liu et al.; “Circuit Switching Under the Radar with REACToR”; Apr. 2-4, 2014; 16 pages; USENIX; available at: https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-liu_he.pdf. |
Ivan Pepelnjak; Introduction to 802.1Qbb (Priority-based Flow Control—PFC); accessed on Jun. 25, 2020; available at: https://gestaltit.com/syndicated/ivan/introduction-802-1qbb-priority-based-flow-control-pfc/. |
Juniper Networks Inc.; Configuring Priority-Based Flow Control for an EX Series Switch (CLI Procedure); Sep. 25, 2019; available at: https://www.juniper.net/documentation/en_US/junos/topics/task/configuration/cos-priority-flow-control-cli-ex-series.html. |
Pending U.S. Appl. No. 16/914,206, filed Jun. 26, 2020, entitled “Devices and Methods for Managing Network Traffic for a Distributed Cache”, Marjan Radi et al. |
Wikipedia; Paxos (computer science); accessed on Jun. 27, 2020; available at: https://en.wikipedia.org/wiki/Paxos_(computer_science). |
Paul Krzyzanowski; “Understanding Paxos”; PK.org; Distributed Systems; Nov. 1, 2018; available at: https://www.cs.rutgers.edu/˜pxk/417/notes/paxos.html. |
Leslie Lamport; “Paxos Made Simple”; Nov. 1, 2001; available at: https://lamport.azurewebsites.net/pubs/paxos-simple.pdf. |
Written Opinion dated Apr. 27, 2020 from International Application No. PCT/US2019/068269, 3 pages. |
Disco White Paper; “Intelligent Buffer Management on Cisco Nexus 9000 Series Switches”; Jun. 6, 2017; 22 pages; available at: https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-738488.html. |
Pending U.S. Appl. No. 17/174,681, filed Feb. 12, 2021, entitled “Devices and Methods for Network Message Sequencing”, Marjan Radi et al. |
Pending U.S. Appl. No. 17,175,449, filed Feb. 12, 2021, entitled “Management of Non-Volatile Memory Express Nodes”, Marjan Radi et al. |
Ibrar et al.; “PrePass-Flow: A Machine Learning based Technique to Minimize ACL Policy Violation Due to Links Failure in Hybrid SDN”; Nov. 20, 2020; Computer Networks; available at https://doi.org/10.1016/j.comnet.2020.107706. |
Saif et al.; “IOscope: A Flexible I/O Tracer for Workloads' I/O Pattern Characterization”; Jan. 25, 2019; International Conference on High Performance Computing; available at https://doi.org/10.1007/978-3-030-02465-9_7. |
Zhang et al.; “PreFix Switch Failure Prediction in Datacenter Networks”; Mar. 2018; Proceedings of the ACM on the Measurement and Analysis of Computing Systems; available at: https://doi.org/10.1145/3179405. |
Pending U.S. Appl. No. 17/353,781, filed Jun. 21, 2021, entitled “In-Network Failure Indication and Recovery”, Marjan Radi et al. |
Pending U.S. Appl. No. 17/331,453, filed May 26, 2021, entitled “Distributed Cache Management”, Marjan Radi et al. |
Stefanovici et al.; “Software-Defined Caching: Managing Caches in Multi-Tenant Data Centers”; Aug. 2015; pp. 174-181; SoCC '15: Proceedings of the Sixth ACM Symposium on Cloud Computing; available at: http://dx.doi.org/10.1145/2806777.2806933. |
Mahmood et al.; “Efficient Caching through Stateful SDN in Named Data Networking”; Dec. 14, 2017; Transactions on Emerging Telecommunications Technologies; vol. 29, issue 1; available at: https://onlinelibrary.wiley.com/doi/abs/10.1002/ett.3271. |
Liu et al.; “DistCache: Provable Load Balancing for Large-Scale Storage Systems with Distributed Caching”; Feb. 2019; Proceedings of the 17th USENIX Conference on File and Storage Technologies; available at: https://www.usenix.org/conference/fast19/presentation/liu. |
International Search Report and Written Opinion dated Oct. 28, 2021 from International Application No. PCT/US2021/039070, 7 pages. |
Liu et al.; “DistCache: provable load balancing for large-scale storage systems with distributed caching”; FAST '19: Proceedings of the 17th USENIX Conference on File and Storage Technologies; Feb. 2019; pp. 143-157 (Year 2019). |
Radi et al.; “OmniXtend: direct to caches over commodity fabric”; 2019 IEEE Symposium on High-Performance Interconnects (HOTI); Santa Clara, CA; Aug. 2019; pp. 59-62 (Year 2019). |
Wang et al.; “Concordia: Distributed Shared Memory with In-Network Cache Coherence”; 19th USENIX Conference on File and Storage Technologies; pp. 277-292; Feb. 2021. |
International Search Report and Written Opinion dated Jun. 1, 2022 from International Application No. PCT/US2022/017608, 7 pages. |
Intel Corporation; “In-Band Network Telemetry Detects Network Performance Issues”; White Paper, Dec. 18, 2020; available at: https://buildersintel.com/docs/networkbuilders/in-band-network-telemetry-detects-network-performance-issues.pdf. |
International Search Report and Written Opinion dated Jul. 7, 2022 from International Application No. PCT/US2022/017633, 7 pages. |
Sabella et al.; “Using eBPF for network traffic analysis”; available at: Year: 2018; https://www.ntop.org/wp-content/uploads/2018/10/Sabella.pdf. |
Wikipedia; “Multistage interconnection networks”; accessed on Sep. 21, 22; available at: https://en.wikipedia.org/wiki/Multistage_interconnection_networks. |
Lysne et al.; “Networks, Multistage”; Encyclopedia of Parallel Computing; p. 1316-1321; Year 2011; available at: https://link.springer.com/referenceworkentry/10.1007/978-0-387-09766-4_317. |
Number | Date | Country | |
---|---|---|---|
20210406191 A1 | Dec 2021 | US |