The use of memory access prediction and data prefetch have proven effective in reducing the performance gap between typically faster Central Processing Unit (CPU) processing speeds and slower memory access speeds. Such a performance gap creates a bottleneck in terms of system performance when retrieving cache lines to be loaded into a cache of the CPU (e.g., L1/L2/L3 caches) for processing. Memory access prediction and data prefetch can allow a CPU to predict future memory access needs based on a history of memory access patterns.
Although techniques such as memory access prediction and data prefetch have been used in a single device with a CPU and main memory, such techniques have not been developed for distributed caches where cache lines would be accessed by different processing nodes from one or more memory nodes on a network. Conventional network latencies in transferring data between processing nodes and memory nodes have generally limited the use of such distributed caches.
However, the emergence of high-performance networking (e.g., 100 Gb/s per link and 6.4 Tbit/s aggregate throughput) using Software Defined Networking (SDN) means that the network may no longer be the performance bottleneck in implementing a distributed cache on a network. In this regard, the data transfer latency of conventional fixed-function networking, as opposed to more recent SDN, can be three orders of magnitude greater than typical memory device data access latencies. For example, data transfer latencies with conventional fixed-function networking is typically in terms of hundreds of microseconds, as compared to data access latencies in terms of hundreds of nanoseconds for memory devices such as Dynamic Random Access Memory (DRAM) or Static Random Access Memory (SRAM).
For high-performance networks, the data access latency can be become the greatest contributor to system performance latency. This is especially true where the memory device is a Storage Class Memory (SCM), such as a Magnetic Resistance Random Access Memory (MRAM), a Phase Change Memory (PCM), or a Resistive RAM (RRAM). Recently developed SCMs can provide non-volatile storage of data with a high granularity of access (i.e., byte-addressable or cache line size) and a shorter data access latency, as compared to storage devices, such as a Solid-State Drive (SSD) using flash memory or a Hard Disk Drive (HDD) using a rotating magnetic disk. Although SCM generally consumes less power and costs less for a given storage capacity than DRAM or SRAM, SCM typically has a longer data access latency than DRAM or SRAM. For example, the data access latency of some recently developed SCMs is 4 to 100 times greater than DRAM. As a result, the use of SCM as a memory device for a distributed cache has been limited.
The features and advantages of the embodiments of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings. The drawings and the associated descriptions are provided to illustrate embodiments of the disclosure and not to limit the scope of what is claimed.
In the following detailed description, numerous specific details are set forth to provide a full understanding of the present disclosure. It will be apparent, however, to one of ordinary skill in the art that the various embodiments disclosed may be practiced without some of these specific details. In other instances, well-known structures and techniques have not been shown in detail to avoid unnecessarily obscuring the various embodiments.
Programmable switch 112 routes memory messages, such as write requests, read requests, and other communications between clients 102 and memory devices 128. For example, such memory messages may include a read request for a specific memory address or a permission level request for a client to modify a cache line requested from a memory device. Such permission levels can be used to maintain a coherency of data across devices in network 100.
In some implementations, programmable switch 112 can include, for example, a switch that can be programmed to handle different custom protocols. In such implementations, a data plane that controls the point-to-point packet forwarding behavior of programmable switch 112 is programmable and separate from a higher-level control plane that determines end-to-end routes for packets between devices on network 100. As discussed in more detail below with reference to
In one example, programmable switch 112 can be a 64 port Top of Rack (ToR) P4 programmable switch, such as a Barefoot Networks Tofino Application Specific Integrated Circuit (ASIC) with ports configured to provide 40. Gigabit Ethernet (GE) frame rates. Other types of programmable switches that can be used as programmable switch 112 can include, for example, a Cavium Xpliant programmable switch or a Broadcom Trident 3 programmable switch.
The use of a programmable switch allows for the configuration of high-performance and scalable memory centric architectures by defining customized packet formats and processing behavior, such as those discussed below with reference to
Host 120 serves as a data prefetch and memory access prediction host that updates memory prefetch prediction information based on cache miss data received from programmable switch 112. As shown in the example of
However, host 120 provides a centralized memory prefetch prediction for accessing data from memory devices 128 on network 100. This centralized arrangement of programmable switch 112 and host 120 can result in a more efficient memory prefetch prediction for prefetching cache lines based on the previous cache line requests for multiple clients (i.e., clients 102A, 102B, and 102C in
Memory devices 128 can include, for example, Storage Class Memories (SCMs) or other types of memory, such as Dynamic Random Access Memory (DRAM) or Static RAM (SRAM), that can store and retrieve data at a byte-addressable size or cache line size, as opposed to a page size as in storage devices such as Solid-State Drives (SSDs) or Hard Disk Drives (HDDs). SCMs can include, for example, Chalcogenide RAM (C-RAM), Phase Change Memory (PCM), Programmable Metallization Cell RAM (PMC-RAM or PMCm), Ovonic Unified Memory (OUM), Resistive RAM (RRAM), Ferroelectric Memory (FeRAM), Magnetoresistive RAM (MRAM), or 3D-XPoint memory. Recently developed SCMs can provide non-volatile storage with a fine granularity of access (i.e., byte-addressable or cache line level) and a shorter data access latency, as compared to storage devices, such as an SSD using flash memory or an HDD using a rotating magnetic disk.
SCM also generally consumes less power, can store more data in a given physical area, and costs less for a given storage capacity than DRAM or SRAM. However, SCM typically has a longer data access latency than DRAM and SRAM. For example, the data access latency of some recently developed SCMs is 4 to 100 times greater than DRAM. As discussed above, the shorter latencies of high-performance networks and processing devices (e.g., CPUs) have shifted the bottleneck in implementing a distributed cache to the memory devices connected on the network.
In one aspect, the programmable switches of the present disclosure can prefetch cache lines from a distributed cache based on memory prefetch prediction to decrease a network-wide data access latency over time to compensate for a greater data access latency of SCMs as compared to DRAM or SRAM. This can allow for less expensive SCM to be used in a distributed cache in place of more expensive DRAM or SRAM. In addition, power usage of the distributed cache is reduced, since such SCMs typically use less power than DRAM or SRAM. Although some or all of memory devices 128 can include an SCM, other implementations of network 100 can include other types of memory devices such as DRAM or SRAM, since the in-network memory access prediction and data prefetch discussed herein can also decrease the overall data transfer latencies of such memories over time.
As will be appreciated by those of ordinary skill in the art, network 100 may include additional devices or a different number of devices than shown in the example of
Processors 104 can include circuitry such as a CPU, microcontroller, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), hard-wired logic, analog circuitry and/or a combination thereof. In some implementations, processors 104 can include a System on a Chip (SoC), which may be combined with one or both of memory 106 and interface 110. Processors 104A and 104B can include one or more cache levels (e.g., L1/L2/L3 cache levels) where data is loaded from or flushed into memories 106A and 106B, respectively, or loaded from or flushed into memory devices 128 via programmable switch 112. Such data can include, for example, portions of code and related data being processed by processor a 104. The data accessed by processors 104A and 104B is referred to herein as cache lines that have a particular cache line size, such as 64 bytes, for example.
Memories 106A and 106B can include, for example, a volatile RAM such as DRAM, a non-volatile RAM, or other solid-state memory that are used by processors 104A and 104B, respectively, as an internal main memory to store data. Data stored in memories 106 can include data read from storage devices 108, data to be stored in storage devices 108, instructions loaded from distributed cache modules 12 or applications 14 for execution by processors 104, and/or data used in executing such applications. In addition to loading data from an internal main memory 106, processors 104 also load data from memory devices 128 as an external main memory or distributed cache. Such data may also be flushed after modification by the processor 104 or evicted without modification back into an internal main memory 106 or an external main memory device 128 via programmable switch 112.
As shown in
Storage devices 108A and 108B serve as secondary storage that can include, for example, one or more rotating magnetic disks or non-volatile solid-state memory, such as flash memory. While the description herein refers to solid-state memory generally, it is understood that solid-state memory may comprise one or more of various types of memory devices such as flash integrated circuits, NAND memory (e.g., single-level cell (SLC) memory, multi-level cell (MLC) memory (i.e., two or more levels), or any combination thereof), NOR memory, EEPROM, other discrete Non-Volatile Memory (NVM) chips, or any combination thereof. As noted above internal main memories 106 and external memory devices 128 typically provide faster data access and can provide more granular data access (e.g., cache line size or byte-addressable) than storage devices 108.
Interfaces 110A and 110B are configured to interface clients 102A and 102B, respectively, with devices on network 100, such as programmable switch 112. Interfaces 110 may communicate using a standard such as, for example, Ethernet, Fibre Channel, or InifiniBand. In this regard, clients 102, programmable switch 112, host 120, and memory devices 128 may not be physically co-located and may communicate over a network such as a Local Area Network (LAN) or a Wide Area Network (WAN). As will be appreciated by those of ordinary skill in the art, interface 110A or 110B can be included as part of processor 104A or processor 104B, respectively.
Programmable switch 112 in some implementations can be a ToR switch for a server rack including memory devices 128. In the example of
Memory 118 of programmable switch 112 can include, for example, a volatile RAM such as DRAM, or a non-volatile RAM or other solid-state memory such as register arrays that are used by circuitry 116 to execute instructions loaded from switching module 18 or firmware of programmable switch 112, and/or data used in executing such applications, such as prefetch information 16. In this regard, and as discussed in more detail below, switching module 18 can include instructions for implementing processes such as those discussed with reference to
In the example of
Host 120 in the example of
Processor 122 of host 120 executes prefetch prediction module 24 to generate or update prefetch information that is sent to programmable switch 112. As discussed above, prefetch information 16 can be used by programmable switch 112 to determine one or more additional cache lines to request in response to receiving a cache line request from a client 102. In the implementation of
As will be appreciated by those of ordinary skill in the art, other implementations may include a different arrangement or number of components or modules than shown in the example of
In the example of
In
In addition, programmable switch 112 sends the cache line request received from client 102A to host 120. In some implementations, this may be accomplished by mirroring a packet for the cache line request to a port for host 120, such as port d in the example of
In other implementations, the order of actions taken in the example of
However, unlike the example of
In
As shown in the example of
In addition, programmable switch 112 sends the cache line request received from client 102A to host 120. In some implementations, this may be accomplished by mirroring a packet for the cache line request to a port for host 120, such as port d in the example of
As shown in
Programmable switch 112 in the example of
In
As shown in the example of
In addition, programmable switch 112 sends the cache line request received from client 102A to host 120. In some implementations, this may be accomplished by mirroring a packet for the cache line request to a port for host 120, such as port d in the example of
In the example of
As noted above, memory messages can have a custom packet format so that programmable switch 112 can distinguish memory messages, such as messages for cache line addressed data, from other network traffic, such as messages for page addressed data. The indication of a memory message, such as a cache line request, causes circuitry 116 of programmable switch 112 to handle the packet differently from other packets that are not indicated as being a memory message. In some implementations, the custom packet format fits into a standard 802.3 Layer 1 frame format, which can allow the packets to operate with existing and forthcoming programmable switches, such as a Barefoot Tofino ASIC switch, for example. In such an implementation, the preamble, start frame delimiter, and interpacket gap may follow the standard 802.3 Layer 1 frame format, but portions in Layer 2 are replaced with custom header fields that can be parsed by programmable switch 112. A payload of a packet for a memory message can include one or more memory addresses for one or more cache lines being requested by a client or being returned to a client.
Stages 361 and 362 can include, for example programmable Arithmetic Logic Units (ALUs) and one or more memories that store match-action tables as part of prefetch information, such as prefetch information 16 in
Traffic manager 38 routes the cache line request to an appropriate port of programmable switch 112. In addition, and as discussed above, traffic manager 38 may mirror the originally received packet for the cache line request to a port for host 120 to provide host 120 with cache miss data. In some implementations, the ingress pipeline calculates offsets for additional cache line prefetches based on the parsed header fields, and then generates corresponding additional packets using a packet generation engine of programmable switch 112. In the example of
As will be appreciated by those of ordinary skill in the art, other implementations may include a different arrangement of modules for a programmable switch. For example, other implementations may include more or less stages as part of the ingress or egress pipeline.
The receipt of program instructions or programming in block 502 may occur during a configuration process of programmable switch 112 when programmable switch 112 is offline or not connected to network 100. In other cases, the programming or program instructions may be received while programmable switch 112 is connected to network 100 and may come from a host or other device on network 100, such as from host 120, for example. The dashed line between blocks 502 and 504 in
In block 504, a cache line request is received by programmable switch 112 from a client 102 of a plurality of clients to obtain a cache line. As discussed above, the cache line is a size of data that can be used by a processor of the requesting client that would otherwise be accessed from a local main memory of the client in a conventional system.
In block 506, programmable switch 112 identifies one or more additional cache lines to obtain based on the received cache line request and prefetch information, in accordance with the program instructions received in block 502. In some implementations, a programmable ingress pipeline including one or more stages can perform match-action operations to associate an address for the requested data with one or more additional addresses for the one or more additional cache lines using match-action tables that form prefetch information 16. In more detail, memory addresses for the one or more additional cache lines can be calculated using offsets included in prefetch information 16. As discussed below with reference to the prefetch information update process of
In block 508, programmable switch 112 requests the cache line for the received request and the identified one or more additional cache lines. Packets for the cache line requests can be formed by an ingress pipeline and packet generation engine of the programmable switch. A traffic manager of the programmable switch can route the packets to one or more different ports corresponding to one or more memory devices 128 storing the multiple cache lines. The ports can be identified by the traffic manager and control plane based on the memory addresses for the cache lines. In addition, the traffic manager of programmable switch 112 may also mirror the original cache line request to a port for host 120 to provide host 120 with cache miss data.
In block 510, the requested cache line and one or more additional cache lines are received from the one or more memory devices 128. The received cache lines can proceed through the ingress pipeline to determine that the cache lines should be routed back to the client that sent the original cache line request. In block 512, programmable switch 112 sends the requested cache line and the one or more additional cache lines to the requesting client.
In block 602, programmable switch 112 compares an address for a cache line requested by a cache line request to addresses stored in one or more match-action tables to identify a matching address. The identification can be performed as part of an ingress or egress pipeline of programmable switch 112 where headers or frames of packets can be processed in parallel following a parser to identify an address in the cache line request and compare the address to one or more match-action tables that include addresses and a corresponding instruction or action to be taken upon finding a match.
In block 604, a stage of programmable switch 112 (e.g., stage 361, 362, 401, or 402 in
Other implementations may identify addresses for additional cache lines in a different way. For example, some implementations may include tables that associate the additional addresses themselves with the matching address so that it is not necessary for programable switch 112 to calculate offset addresses as in the example of
As discussed above, prefetch information 16 can be provided by a separate host 120 that allows for updates to be performed to prefetch information 16 based on cache miss data (e.g., received cache line requests) without interfering with the operation of programmable switch 112.
In block 704, host 120 updates prefetch information for programmable switch 112 or creates new prefetch information based on the cache miss data received in block 902. In the example of
In block 706, host 120 sends updated prefetch information to programmable switch 112. In some implementations, the updated prefetch information may only include information for addresses that have changed since a previous version of the prefetch information. In other implementations, host 120 may send updated prefetch information for all of the addresses represented by the cache miss data. The updated prefetch information can include, for example, new match-action tables or portions thereof. As discussed above, the match-action tables in some implementations can include offsets representing addresses to be calculated by programmable switch 112 to identify one or more additional cache lines to be obtained for a matching address.
As discussed above, the foregoing use of a centralized programmable switch to speculatively prefetch cache lines can ordinarily improve the performance of a distributed cache on a network in terms of an average number of operations that can be performed in a given timeframe. Such prefetching can allow for the use of less expensive, physically denser, and lower power SCMs in a distributed cache, as compared to DRAM or SRAM, and can compensate for latencies due to maintaining data coherency in the distributed cache.
Those of ordinary skill in the art will appreciate that the various illustrative logical blocks, modules, and processes described in connection with the examples disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. Furthermore, the foregoing processes can be embodied on a computer readable medium which causes a processor or computer to perform or execute certain functions.
To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, and modules have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Those of ordinary skill in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The various illustrative logical blocks, units, modules, and controllers described in connection with the examples disclosed herein may be implemented or performed with a general purpose processor, a CPU, a DSP, an ASIC, an FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, an SoC, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The activities of a method or process described in connection with the examples disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The steps of the method or algorithm may also be performed in an alternate order from those provided in the examples. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable media, an optical media, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC or an SoC.
The foregoing description of the disclosed example embodiments is provided to enable any person of ordinary skill in the art to make or use the embodiments in the present disclosure. Various modifications to these examples will be readily apparent to those of ordinary skill in the art, and the principles disclosed herein may be applied to other examples without departing from the spirit or scope of the present disclosure. The described embodiments are to be considered in all respects only as illustrative and not restrictive.
This application claims the benefit of U.S. Provisional Application No. 62/842,959 entitled “DISTRIBUTED BRANCH PREDICTION WITH IN-NETWORK PREFETCH”, filed on May 3, 2019, which is hereby incorporated by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
6044438 | Olnowich | Mar 2000 | A |
6078997 | Young et al. | Jun 2000 | A |
6108737 | Sharma et al. | Aug 2000 | A |
6209065 | Doren et al. | Mar 2001 | B1 |
6230243 | Elko et al. | May 2001 | B1 |
6263404 | Borkenhagen et al. | Jul 2001 | B1 |
6298418 | Fujiwara et al. | Oct 2001 | B1 |
6343346 | Olnowich | Jan 2002 | B1 |
6775804 | Dawson | Aug 2004 | B1 |
6829683 | Kuskin | Dec 2004 | B1 |
6868439 | Basu | Mar 2005 | B2 |
6954844 | Lentz et al. | Oct 2005 | B2 |
6993630 | Williams et al. | Jan 2006 | B1 |
7032078 | Cypher et al. | Apr 2006 | B2 |
7376799 | Veazey et al. | May 2008 | B2 |
7673090 | Kaushik et al. | Mar 2010 | B2 |
7716425 | Uysal et al. | May 2010 | B1 |
7975025 | Szabo et al. | Jul 2011 | B1 |
8166251 | Luttrell | Apr 2012 | B2 |
8281075 | Arimilli et al. | Oct 2012 | B2 |
9088592 | Craft | Jul 2015 | B1 |
9313604 | Holcombe | Apr 2016 | B1 |
9442850 | Rangarajan et al. | Sep 2016 | B1 |
9467380 | Hong et al. | Oct 2016 | B2 |
9819739 | Hussain et al. | Nov 2017 | B2 |
9825862 | Bosshart | Nov 2017 | B2 |
9826071 | Bosshart | Nov 2017 | B2 |
9880768 | Bosshart | Jan 2018 | B2 |
9910615 | Bosshart | Mar 2018 | B2 |
9912610 | Bosshart et al. | Mar 2018 | B2 |
9923816 | Kim et al. | Mar 2018 | B2 |
9936024 | Malwankar et al. | Apr 2018 | B2 |
9940056 | Bosshart | Apr 2018 | B2 |
10038624 | Cruz et al. | Jul 2018 | B1 |
10044583 | Kim et al. | Aug 2018 | B2 |
10050854 | Licking et al. | Aug 2018 | B1 |
10063407 | Kodeboyina et al. | Aug 2018 | B1 |
10063479 | Kim et al. | Aug 2018 | B2 |
10063638 | Huang | Aug 2018 | B2 |
10067967 | Bosshart | Sep 2018 | B1 |
10075567 | Licking et al. | Sep 2018 | B1 |
10078463 | Bosshart | Sep 2018 | B1 |
10084687 | Sharif et al. | Sep 2018 | B1 |
10110454 | Kim et al. | Oct 2018 | B2 |
10127983 | Peterson et al. | Nov 2018 | B1 |
10133499 | Bosshart | Nov 2018 | B2 |
10146527 | Olarig et al. | Dec 2018 | B2 |
10158573 | Lee et al. | Dec 2018 | B1 |
10164829 | Watson et al. | Dec 2018 | B1 |
10169108 | Gou et al. | Jan 2019 | B2 |
10225381 | Bosshart | Mar 2019 | B1 |
10230810 | Bhide et al. | Mar 2019 | B1 |
10237206 | Agrawal et al. | Mar 2019 | B1 |
10257122 | Li et al. | Apr 2019 | B1 |
10268634 | Bosshart et al. | Apr 2019 | B1 |
10298456 | Chang | May 2019 | B1 |
10496566 | Olarig et al. | Dec 2019 | B2 |
10628353 | Prabhakar et al. | Apr 2020 | B2 |
10635316 | Singh et al. | Apr 2020 | B2 |
10761995 | Blaner et al. | Sep 2020 | B2 |
10812388 | Thubert et al. | Oct 2020 | B2 |
10880204 | Shalev et al. | Dec 2020 | B1 |
20030009637 | Arimilli et al. | Jan 2003 | A1 |
20030028819 | Chiu et al. | Feb 2003 | A1 |
20030158999 | Hauck et al. | Aug 2003 | A1 |
20040044850 | George et al. | Mar 2004 | A1 |
20040073699 | Hong et al. | Apr 2004 | A1 |
20040260883 | Wallin | Dec 2004 | A1 |
20050058149 | Howe | Mar 2005 | A1 |
20060265568 | Burton | Nov 2006 | A1 |
20070067382 | Sun | Mar 2007 | A1 |
20080010409 | Rao | Jan 2008 | A1 |
20090087341 | Cai | Apr 2009 | A1 |
20090240664 | Dinker et al. | Sep 2009 | A1 |
20090240869 | O'Krafka et al. | Sep 2009 | A1 |
20090313503 | Atluri | Dec 2009 | A1 |
20100008260 | Kim et al. | Jan 2010 | A1 |
20100223322 | Mott et al. | Sep 2010 | A1 |
20110004729 | Akkawi et al. | Jan 2011 | A1 |
20110093925 | Krishnamoorthy et al. | Apr 2011 | A1 |
20110238923 | Hooker | Sep 2011 | A1 |
20120110108 | Li et al. | May 2012 | A1 |
20120155264 | Sharma et al. | Jun 2012 | A1 |
20130254325 | Song et al. | Sep 2013 | A1 |
20130263249 | Song et al. | Oct 2013 | A1 |
20140219284 | Chau et al. | Aug 2014 | A1 |
20140269413 | Hui et al. | Sep 2014 | A1 |
20140269716 | Pruss | Sep 2014 | A1 |
20140278575 | Anton et al. | Sep 2014 | A1 |
20140362709 | Kashyap et al. | Dec 2014 | A1 |
20150195216 | Di Pietro et al. | Jul 2015 | A1 |
20150301949 | Koka et al. | Oct 2015 | A1 |
20150319243 | Hussain et al. | Nov 2015 | A1 |
20150378919 | Anantaraman | Dec 2015 | A1 |
20160099872 | Kim et al. | Apr 2016 | A1 |
20160127492 | Malwankar et al. | May 2016 | A1 |
20160156558 | Hong et al. | Jun 2016 | A1 |
20160216913 | Bosshart | Jul 2016 | A1 |
20160246507 | Bosshart | Aug 2016 | A1 |
20160246535 | Bosshart | Aug 2016 | A1 |
20160294451 | Jung et al. | Oct 2016 | A1 |
20160315964 | Shetty et al. | Oct 2016 | A1 |
20160323189 | Ahn et al. | Nov 2016 | A1 |
20170026292 | Smith et al. | Jan 2017 | A1 |
20170054618 | Kim | Feb 2017 | A1 |
20170054619 | Kim | Feb 2017 | A1 |
20170063690 | Bosshart | Mar 2017 | A1 |
20170064047 | Bosshart | Mar 2017 | A1 |
20170093707 | Kim et al. | Mar 2017 | A1 |
20170093986 | Kim et al. | Mar 2017 | A1 |
20170093987 | Kaushalram et al. | Mar 2017 | A1 |
20170187846 | Shalev et al. | Jun 2017 | A1 |
20170214599 | Seo et al. | Jul 2017 | A1 |
20170286363 | Joshua et al. | Oct 2017 | A1 |
20170371790 | Dwiel | Dec 2017 | A1 |
20180034740 | Beliveau et al. | Feb 2018 | A1 |
20180060136 | Herdrich et al. | Mar 2018 | A1 |
20180173448 | Bosshart | Jun 2018 | A1 |
20180176324 | Kumar et al. | Jun 2018 | A1 |
20180234340 | Kim et al. | Aug 2018 | A1 |
20180234355 | Kim et al. | Aug 2018 | A1 |
20180239551 | Bosshart | Aug 2018 | A1 |
20180242191 | Lundqvist et al. | Aug 2018 | A1 |
20180260330 | Felter et al. | Sep 2018 | A1 |
20180262459 | Wang | Sep 2018 | A1 |
20180285275 | Barczak et al. | Oct 2018 | A1 |
20180329818 | Cheng | Nov 2018 | A1 |
20180335953 | Ramaswamy et al. | Nov 2018 | A1 |
20180337860 | Kim et al. | Nov 2018 | A1 |
20180349163 | Gao | Dec 2018 | A1 |
20180349285 | Ish et al. | Dec 2018 | A1 |
20190012278 | Sindhu et al. | Jan 2019 | A1 |
20190044878 | Steffen et al. | Feb 2019 | A1 |
20190050333 | Chacon et al. | Feb 2019 | A1 |
20190058646 | Kim et al. | Feb 2019 | A1 |
20190196987 | Shen | Jun 2019 | A1 |
20190220429 | Ranjan et al. | Jul 2019 | A1 |
20190227921 | Frolikov | Jul 2019 | A1 |
20190342785 | Li et al. | Nov 2019 | A1 |
20190354402 | Bivens et al. | Nov 2019 | A1 |
20190370176 | Priyadarshi | Dec 2019 | A1 |
20190394261 | DeCusatis et al. | Dec 2019 | A1 |
20200007408 | Siddappa | Jan 2020 | A1 |
20200065269 | Balasubramani et al. | Feb 2020 | A1 |
20200068014 | Sarkar et al. | Feb 2020 | A1 |
20200089619 | Hsu et al. | Mar 2020 | A1 |
20200097212 | Lakshman et al. | Mar 2020 | A1 |
20200151104 | Yang | May 2020 | A1 |
20200213156 | Cheng et al. | Jul 2020 | A1 |
20200226068 | Gellerich et al. | Jul 2020 | A1 |
20200250099 | Campbell et al. | Aug 2020 | A1 |
20200293499 | Kohli et al. | Sep 2020 | A1 |
20200313999 | Lee et al. | Oct 2020 | A1 |
20200349080 | Radi | Nov 2020 | A1 |
20200379668 | Akaike et al. | Dec 2020 | A1 |
20210034250 | Mizuno et al. | Feb 2021 | A1 |
20210034270 | Gupta et al. | Feb 2021 | A1 |
20210049078 | Khan | Feb 2021 | A1 |
20210051751 | Pawar | Feb 2021 | A1 |
20210073086 | Subraya et al. | Mar 2021 | A1 |
20210149807 | Gupta et al. | May 2021 | A1 |
20210194828 | He et al. | Jun 2021 | A1 |
20210218623 | Jain et al. | Jul 2021 | A1 |
20210247935 | Beygi et al. | Aug 2021 | A1 |
20210294506 | Tadokoro | Sep 2021 | A1 |
20210318828 | Valtonen | Oct 2021 | A1 |
Number | Date | Country |
---|---|---|
102163279 | Oct 2020 | KR |
Entry |
---|
Botelho et al.; “On the Design of Practical Fault-Tolerant SDN Controllers”; Sep. 2014; 6 pages; available at http://www.di.fc.ul.pt/˜bessani/publications/ewsdn14-ftcontroller.pdf. |
Huynh Tu Dang; “Consensus Protocols Exploiting Network Programmability”; Mar. 2019; 154 pages; available at: https://doc.rero.ch/record/324312/files/2019INFO003.pdf. |
Jialin Li; “Co-Designing Distributed Systems with Programmable Network Hardware”; 2019; 205 pages; available at: https://digital.lib.washington.edu/researchworks/bitstream/handle/1773/44770/Li_washington_0250E_20677.pdf?sequence=1&isAllowed=y. |
Liu et al.; “Circuit Switching Under the Radar with REACToR”; Apr. 2-4, 2014; 16 pages; USENIX; available at: https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-liu_he.pdf. |
Hashemi et al.; “Learning Memory Access Patters”; 15 pages; Mar. 6, 2018; available at https://arxiv.org/pdf/1803.02329.pdf. |
Kim, et al.; “A Framework for Data Prefetching using Off-line Training of Markovian Predictors”; Sep. 18, 2002; 8 pages; available at https://www.comp.nus.edu.sg/˜wongwf/papers/ICCD2002.pdf. |
Cisco White Paper; “Intelligent Buffer Management on Cisco Nexus 9000 Series Switches”; Jun. 6, 2017; 22 pages; available at: https://www.cisco.com/c/en/us/products/collateral/switches/nexus-9000-series-switches/white-paper-c11-738488.html. |
Pending U.S. Appl. No. 17/174,681, filed Feb. 12, 2021, entitled “Devices and Methods for Network Message Sequencing”, Marjan Radi et al. |
Pending U.S. Appl. No. 17/175,449, filed Feb. 12, 2021, entitled “Management of Non-Volatile Memory Express Nodes”, Marjan Radi et al. |
Leslie Lamport; “Paxos Made Simple”; Nov. 1, 2001; available at: https://lamport.azurewebsites.net/pubs/paxos-simple.pdf. |
Paul Krzyzanowski; “Understanding Paxos”; PK.org; Distributed Systems; Nov. 1, 2018; available at: https://www.cs.rutgers.edu/˜pxk/417/notes/paxos.html. |
Wikipedia; Paxos (computer science); accessed on Jun. 27, 2020; available at: https://en.wikipedia.org/wiki/Paxos_(computer_science). |
Pending U.S. Appl. No. 16/916,730, filed Jun. 30, 2020, entitled “Devices and Methods for Failure Detection and Recovery for a Distributed Cache”, Radi et al. |
Ivan Pepelnjak; Introduction to 802.1Qbb (Priority-based Flow Control-PFC); accessed on Jun. 25, 2020; available at: https://gestaltit.com/syndicated/ivan/introduction-802-1qbb-priority-based-flow-control-pfc/. |
Juniper Networks Inc.; Configuring Priority-Based Flow Control for an EX Series Switch (CLI Procedure); Sep. 25, 2019; available at: https://www.juniper.net/documentation/en_US/junos/topics/task/configuration/cos-priority-flow-control-cli-ex-series.html. |
Pending U.S. Appl. No. 16/914,206, filed Jun. 26, 2020, entitled “Devices and Methods for Mianaging Network Traffic for a Distributed Cache”, Radi et al. |
Written Opinion dated Feb. 20, 2020 from International Application No. PCT/US2019/068360, 4 pages. |
International Search Report and Written Opinion dated Apr. 27, 2020 from counterpart International Application No. PCT/US2019/068269, 6 pages. |
Eisley et al.; “In-Network Cache Coherence”; 2006; pp. 321-332; Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. |
Jin et al.; “NetCache: Balancing Key-Value Stores with Fast In-Network Caching”; Oct. 28, 2017; pp. 121-136; Proceedings of the 26th Symposium on Operating Systems Principles. |
Li et al.; “Pegasus: Load-Aware Selective Replication with an In-Network Coherence Directory”; Dec. 2018; 15 pages; Technical Report UW-CSE-18-12-01, University of Washington CSE, Seattle, WA. |
Liu et al.; “IncBricks: Toward In-Network Computation with an In-Network Cache”; Apr. 2017; pp. 795-809; ACM SIGOPS Operating Systems Review 51, Jul. 26, No. 2. |
Pending U.S. Appl. No. 16/697,019, filed Nov. 26, 2019, entitled “Fault Tolerant Data Coherence in Large-Scale Distributed Cache Systems”, Marjan Radi et al. |
Vestin et al.; “FastReact: In-Network Control and Caching for Industrial Control Networks using Programmable Data Planes”; Aug. 21, 2018; pp. 219-226; IEEE 23rd International Conference on Emerging Technologies and Factory Automation (ETFA). vol. 1. |
Ibrar et al.; “PrePass-Flow: A Machine Learning based Technique to Minimize ACL Policy Violation Due to Links Failure in Hybrid SDN”; Nov. 20, 2020; Computer Networks; available at https://doi.org/10.1016/j.comnet.2020.107706. |
Saif et al.; “IOscope: A Flexible I/O Tracer for Workloads' I/O Pattern Characterization”; Jan. 25, 2019 International Conference on High Performance Computing; available at https://doi.org/10.1007/978-3-030-02465-9_7. |
Zhang et al.; “PreFix Switch Failure Prediction in Datacenter Networks”; Mar. 2018; Proceedings of the ACM on the Measurement and Analysis of Computing Systems; available at: https://doi.org/10.1145/3179405. |
Pending U.S. Appl. No. 17/353,781, filed Jun. 21, 2021, entitled “In-Network Failure Indication and Recovery”, Marjan Radi et al. |
Liu et al.; “DistCache: provable load balancing for large-scale storage systems with distributed caching”; FAST '19 Proceedings of the 17th USENIX Conference on File and Storage Technologies; Feb. 2019; pp. 143-157 (Year 2019). |
Radi et al.; “OmniXtend: direct to caches over commodity fabric”; 2019 IEEE Symposium on High-Performance Interconnects (HOTI); Santa Clara, CA; Aug. 2019; pp. 59-62 (Year 2019). |
Wang et al.; “Concordia: Distributed Shared Memory with In-Network Cache Coherence”; 19th USENIX Conference on File and Storage Technologies; pp. 277-292; Feb. 2021. |
International Search Report and Written Opinion dated Oct. 28, 2021 from International Application No. PCT/US2021/039070, 7 pages. |
International Search Report and Written Opinion dated Jun. 1, 2022 from International Application No. PCT/US2022/017608, 7 pages. |
Intel Corporation; “In-Band Network Telemetry Detects Network Performance Issues”; White Paper, Dec. 18, 2020; available at: https://builders.intel.com/docs/networkbuilders/in-band-network-telemetry-detects-network-performance-issues.pdf. |
International Search Report and Written Opinion dated Jul. 7, 2022 from International Application No. PCT/US2022/017633, 7 pages. |
Sabella et al.; “Using eBPF for network traffic analysis”; available at: Year: 2018; https://www.ntop.org/wp-content/uploads/2018/10/Sabella.pdf. |
Number | Date | Country | |
---|---|---|---|
20200349080 A1 | Nov 2020 | US |
Number | Date | Country | |
---|---|---|---|
62842959 | May 2019 | US |