Embodiments of the present invention are illustrated by way of example, and not by way of limitation, in the drawings and in which like reference numerals refer to similar elements.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrase “in one embodiment” or “an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in one or more embodiments.
The evolving requirements of network interface devices may call for programmability of the network interface device or a replacing the device with another device which meets the requirements. New capabilities can be implemented in software, but in certain situations, it may be preferable to minimize changes to the device driver, such as with legacy drivers or virtualization. Current competitive pressures include adding protocol-specific optimizations to high-speed network interfaces such as Transmission Control Protocol (TCP) header/payload splitting and TCP segmentation offload. The optimizations typically have the network interface understand packet header format and size. Knowledge of the most typical protocol headers is typically hardwired in the network interface, and only a limited number of protocols can be supported on any one product. It may be desirable for network interfaces at least to be flexible enough to be capable of modification to support evolving protocols while minimizing changes to the device driver.
Host system 102 may include chipset 105, processors 110-0 to 110-N, host memory 112, and storage 114. Chipset 105 may provide intercommunication among processors 110-0 to 110-N, host memory 112, storage 114, bus 116, as well as a graphics adapter that can be used for transmission of graphics and information for display on a display device (both not depicted). For example, chipset 105 may include a storage adapter (not depicted) capable of providing intercommunication with storage 114. For example, the storage adapter may be capable of communicating with storage 114 in conformance at least with any of the following protocols: Small Computer Systems Interface (SCSI), Fibre Channel (FC), and/or Serial Advanced Technology Attachment (S-ATA).
In some embodiments, chipset 105 may include data mover logic (not depicted) capable to perform transfers of information within host system 102 or between host system 102 and network component 118. As used herein, a “data mover” refers to a module for moving data from a source to a destination without using the core processing module of a host processor, such as any of processor 110-0 to 110-N, or otherwise does not use cycles of a processor to perform data copy or move operations. By using the data mover for transfer of data, the processor may be freed from the overhead of performing data movements, which may result in the host processor running at much slower speeds. A data mover may include, for example, a direct memory access (DMA) engine. In some embodiments, data mover may be implemented as part of any of processor 110-0 to 110-N, although other components of computer system 100 may include the data mover. In some embodiments, data mover may be implemented as part of chipset 105.
Any of processors 110-0 to 110-N may be implemented as Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) processors, a hardware thread, or any other microprocessor or central processing unit. Host memory 112 may be implemented as a volatile memory device such as but not limited to a Random Access Memory (RAM), Dynamic Random Access Memory (DRAM), or Static RAM (SRAM). Storage 114 may be implemented as a non-volatile storage device such as but not limited to a magnetic disk drive, optical disk drive, tape drive, an internal storage device, an attached storage device, flash memory, battery backed-up synchronous DRAM (SDRAM), and/or a network accessible storage device.
Bus 116 may provide intercommunication among at least host system 102 and network component 118 as well as other peripheral devices (not depicted). Bus 116 may support serial or parallel communications. Bus 116 may support node-to-node or node-to-multi-node communications. Bus 116 may at least be compatible with Peripheral Component Interconnect (PCI) described for example at Peripheral Component Interconnect (PCI) Local Bus Specification, Revision 3.0, Feb. 2, 2004 available from the PCI Special Interest Group, Portland, Oreg., U.S.A. (as well as revisions thereof); PCI Express described in The PCI Express Base Specification of the PCI Special Interest Group, Revision 1.0a (as well as revisions thereof); PCI-x described in the PCI-X Specification Rev. 1.1, Mar. 28, 2005, available from the aforesaid PCI Special Interest Group, Portland, Oreg., U.S.A. (as well as revisions thereof); and/or Universal Serial Bus (USB) (and related standards) as well as other interconnection standards.
Network component 118 may be capable of providing intercommunication between host system 102 and network 120 in compliance at least with any applicable protocols. Network component 118 may intercommunicate with host system 102 using bus 116. In one embodiment, network component 118 may be integrated into chipset 105. “Network component” may include any combination of digital and/or analog hardware and/or software on an I/O (input/output) subsystem that may process one or more packets to be transmitted and/or received over a network. In one embodiment, the I/O subsystem may include, for example, a network component card (NIC), and network component may include, for example, a MAC (media access control) layer of the Data Link Layer as defined in the Open System Interconnection (OSI) model for networking protocols. The OSI model is defined by the International Organization for Standardization (ISO) located at I rue de Varembe, Case postale 56 CH-1211 Geneva 20, Switzerland.
Network 120 may be any network such as the Internet, an intranet, a local area network (LAN), storage area network (SAN), a wide area network (WAN), or wireless network. Network 120 may exchange network protocol units with network component 118 using the Ethernet standard (described in IEEE 802.3 and related standards) or any communications standard. As used herein, a “network protocol unit” may include any packet or frame or other format of information with a header and payload portions formed in accordance with any protocol specification.
Some embodiments provide techniques to implement a network interface using a general purpose core or hardware thread that is communicatively coupled with a network interface. The combination of the network interface and general purpose core or hardware thread can appear to other cores or hardware threads as a single network interface. The general purpose core or hardware thread associated with the network interface may issue inter-processor interrupts (IPI) to one or more other target core or target hardware thread. The target core or target hardware thread may process the IPI as it would a device interrupt.
In some embodiments, although not a necessary feature of any embodiment, using a general-purpose core or hardware thread may extend the functionality of the network interface to form a new logical device. In some embodiments, although not a necessary feature of any embodiment, the target cores may consider this logical device as hardware because the target cores may not discern between IPIs and device interrupts.
In some embodiments, central core 204 may be communicatively coupled with network interface 206 using a PCI, PCI-X, or PCI Express compliant bus, although other techniques may be used. Network interface 206 may communicate with central core 204 at least using interrupts, message signaled interrupts, or polling.
In some embodiments, central core 204 may perform tasks such as but not limited to: execute an interrupt service routine in response to receipt of an interrupt from the network interface 206; read descriptors from the primary descriptor ring; execute any user-provided code which may modify or classify incoming network protocol units; perform any user-specified network-related operation; assign a target core and its secondary descriptor ring based on a user-specified classification; copy a descriptor from the primary descriptor ring to the appropriate secondary descriptor ring; and/or remove the descriptor from the primary descriptor ring. Primary and secondary descriptor rings can be used to manage processing of received network protocol units by one or more target core.
In some embodiments, network interface 206 may perform tasks such as but not limited to: receive network protocol units from a physical link; copy portion(s) of received network protocol units into host memory via a transfer by a data mover; and/or raise an interrupt to central core 204.
In response to network interface 206 receiving a network protocol unit, network interface 206 may provide an interrupt to central core 204. However, interrupts from network interface 206 to central core 204 can be provided for other reasons. In some embodiments, in response to the interrupt, central core 204 may provide an interrupt to a target core (or hardware thread) using an inter-processor interrupt (IPI) to request processing of portion(s) of the received network protocol unit. An operating system (OS) executed by central core 204 may be programmed to interrupt any combination of cores or hardware threads using one or more IPI. The core or thread that receives the IPI may treat the IPI as a device interrupt such as by invoking an interrupt handler. The target core (or thread) may choose to drop, redirect, or combine interrupts based on decisions it makes about I/O traffic.
One or more target core may perform protocol processing tasks ordinarily performed by the central core, including but not limited to: (1) data link, network, and transport layer protocol processing, including but not limited to (a) determining which protocols are used by the network protocol unit, (b) determining whether the network protocol unit properly follows protocol specifications, (c) tracking the state of network transmissions (e.g., updating TCP sequence numbers), (d) transmitting responses to a transmitter of network protocol units (e.g., sending TCP acknowledgements), and/or (e) arranging data contained in network protocol units (e.g., reassembling data in TCP packets); (2) scheduling operation of an application which is waiting for data from the network, (3) routing network protocol units to another location; (4) filtering unwanted network protocol units, and/or (5) freeing for other uses the memory storing the network protocol units once processing is complete.
In some embodiments, using IPIs to act as device interrupts leaves central core 204 free to implement new functionality for network interface 206 while reducing changes in the interrupt service routine of the device driver of the target core. Because device drivers are typically equipped to use ISRs, it can be more convenient to use IPI to emulate ISRs. Changes (e.g., re-coding efforts) to the device driver's interrupt service routine can be reduced at least because it has been seamlessly modified to service IPIs as well as device interrupts.
In some embodiments, the combination of central core 204 and network interface 206 allows network interface resources to be available to system resources and vice versa. For example, target cores may have full access to network interface resources by access to host memory used by the combination. Not only may a combination of central core 204 and network interface 206 allow full accessibility to network interface resources, but it may also allow extensibility (up to the limits imposed by the system and platform, and not limited by any implementation of network interface 206). Extensibility may be an ability to add new features to an existing program with minimal disruption or change of existing code. For example, by copying only descriptors to target cores and not copying payload to target cores, extensibility may be achieved. An off-the-shelf implementation of network interface 206 may appear to other components as a fully programmable, resource-rich network interface.
A target core or hardware thread may execute emulated network interface ISR 306. Emulated network interface ISR 306 may operate in response to receipt of an IPI from a central core or thread associated with one or more network interface. For example, emulated network interface ISR 306 may treat an IPI from a central core as an interrupt request. For example, emulated network interface ISR 306 may treat any IPI as an interrupt request. Interrupt requests for all devices may be mapped to interrupt vectors. Each vector may be assigned to a function which calls an interrupt service routine (ISR) to process the interrupt request.
In some embodiments, to allow the device driver's ISR to handle IPIs from another core, a device interrupt request may be assigned to identify the logical device and an ISR is dynamically assigned for this interrupt request by the device driver. Thus, at least two types of interrupts and their respective ISRs may be functionally equivalent to the original device interrupt and its ISR, but the IPI may now act as a proxy to trigger data processing in place of the original device interrupt.
For example, emulated network interface ISR 306 may perform an interrupt service routine to process a descriptor in response to receipt of an IPI from a central core or thread associated with one or more network interface. IPI logic 304 may request copying of descriptors into the secondary ring. However, other operations may be performed in response to receipt of an IPI. Emulated network interface ISR 306 may process the descriptor as if it came from a network interface. Emulated network interface ISR 306 may provide the descriptor and data to the upper driver interface (I/F). The upper driver interface may process the descriptor in the same manner as if it had come from the network interface directly. Upper driver interface may be an interface to a virtual machine migration (VMM) logic or an operating system (OS), or other logic. The target core or thread may execute one or more applications (shown as “Apps”). For example, an application may utilize data received in one or more network protocol unit.
Memory associated with each target core (or hardware thread) may store an associated secondary descriptor ring. A central core (or hardware thread) associated with the network interface may manage storage of descriptors into each secondary descriptor ring. Data from received network protocol units can be stored in main memory accessible to the network interface. The target core may receive an IPI from the central core associated with the network interface and, in response, read a specified descriptor from an associated secondary descriptor ring. Based on descriptors in the associated secondary descriptor ring, the target core can copy data to memory associated with the target core and access such data.
In block 620, the network interface may issue a device interrupt to a general purpose core to inform the core of receipt of at least one network protocol unit.
In block 630, the general purpose core may decide which target core is to process the received network protocol unit. For example, the decision may be made in part using receive side scaling techniques, although other techniques may be used. To assign a received network protocol unit to a target core, a descriptor associated with the received network protocol unit may be assigned to a secondary descriptor ring associated with the target core. The portion of the network protocol unit that is to be processed by the target core may be stored in a memory region associated with the general purpose core.
In block 640, the general purpose core may issue an inter-processor interrupt to a target core to indicate availability of a received network protocol unit. Logic executed or available to the target core may invoke an interrupt handler in response to the inter-processor interrupt.
In block 650, the target core may request copying of the portion of the network protocol unit from the memory region associated with the general purpose core to a memory associated with the target core. A descriptor in the secondary descriptor ring associated with the target core may identify the storage location of the portion of the network protocol unit.
Embodiments of the present invention may be implemented as any or a combination of: one or more microchips or integrated circuits interconnected using a motherboard, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The term “logic” may include, by way of example, software or hardware and/or combinations of software and hardware.
Embodiments of the present invention may be provided, for example, as a computer program product which may include one or more machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines carrying out operations in accordance with embodiments of the present invention. A machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (Compact Disc-Read Only Memories), and magneto-optical disks, ROMs (Read Only Memories), RAMs (Random Access Memories), EPROMs (Erasable Programmable Read Only Memories), EEPROMs (Electrically Erasable Programmable Read Only Memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
Moreover, embodiments of the present invention may also be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of one or more data signals embodied in and/or modulated by a carrier wave or other propagation medium via a communication link (e.g., a modem and/or network connection). Accordingly, as used herein, a machine-readable medium may, but is not required to, comprise such a carrier wave.
The drawings and the forgoing description gave examples of the present invention. Although depicted as a number of disparate functional items, those skilled in the art will appreciate that one or more of such elements may well be combined into single functional elements. Alternatively, certain elements may be split into multiple functional elements. Elements from one embodiment may be added to another embodiment. For example, orders of processes described herein may be changed and are not limited to the manner described herein. Moreover, the actions any flow diagram need not be implemented in the order shown; nor do all of the acts necessarily need to be performed. Also, those acts that are not dependent on other acts may be performed in parallel with the other acts. The scope of the present invention, however, is by no means limited by these specific examples. Numerous variations, whether explicitly given in the specification or not, such as differences in structure, dimension, and use of material, are possible. The scope of the invention is at least as broad as given by the following claims.