Specific embodiments of the invention will now be described in detail with reference to the accompanying figures. Like elements in the various figures are denoted by like reference numerals for consistency.
In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a more thorough understanding of the invention. However, it will be apparent to one of ordinary skill in the art that the invention may be practiced without these specific details. In other instances, well-known features have not been described in detail to avoid unnecessarily complicating the description.
In general, embodiments of the invention relate to a method and system for processing packets. More specifically, embodiments of the invention relate to a method and system for classifying packets using multi-level classification. In one embodiment of the invention, the result of the classification is the routing of packets to the appropriate hardware receive ring (HRR) and/or software receive ring (SRR).
In one embodiment of the invention, the hardware classifier (104) is configured to analyze the incoming network traffic, typically in the form of packets, received from the network (not shown). Further, in one embodiment of the invention, the hardware classifier (104) implements multi-level classification as described below in
In one embodiment of the invention, the host (100) may include the following components: a device driver (107), a software ring (108), one or more virtual network interface cards (VNICs) (114A, 114B, 114C, 114D), one or more virtual network stacks (VNSs) (116A, 116B, 116C, 116D), and one or more packet destinations (118) (e.g., containers and/or services). Each of the aforementioned components is described below.
In one embodiment of the invention, the device driver (107) provides an interface between the HRRs (106A, 106B, 106C) and the host (100). More specifically, the device driver (107) exposes the HRRs (106A, 106B, 106C) to the host (100) such that the host (100) (or, more specifically, a process executing in the host (100) may obtain packets from the HRRs (106A, 106B, 106C).
In one embodiment of the invention, the software ring (108) includes a software classifier (110) and a number of software receive rings (SRR) (e.g., SRR A (112A), SRR B (112B)). In one embodiment of the invention, the software classifier (110) has the same functionality as the hardware classifier (104). However, instead of sending the classified packets to a HRR (106A, 106B, 106C), the software classifier (110) forwards classified packets to one of the SRRs (112A, 112B). The SRRs (112A, 112B) are configured to temporarily store the received packets after being classified by the software classifier (110). In one embodiment of the invention, the software ring (108) resides in a Media Access Control (MAC) layer of the host (100).
In one embodiment of the invention, each of the virtual network interface cards (VNICs) (114A, 114B, 114C, 114D) is associated with either a SRR (112A, 112B) or a HRR (106A, 106B, 106C). The VNICs (114A, 114B, 114C, 114D) provide an abstraction layer between the NIC (102) and the various packet destinations (118) executing on the host (100). More specifically, each VNIC (114A, 114B, 114C, 114D) operates like a NIC (100). For example, in one embodiment of the invention, each VNIC (114A, 114B, 114C, 114D) is associated with one or more Internet Protocol (IP) addresses, one or more Media Access Control (MAC) address, optionally, one or more ports, and, is optionally configured to handle one or more protocol types. Thus, while the host (100) may be operatively connected to a single NIC (102), packet destinations (118) (e.g., containers and/or services) executing on the host (100) operate as if the host (100) is bound to multiple NICs. In one embodiment of the invention, the VNICs (114A, 114B, 114C, 114D) resides in a Media Access Control (MAC) layer of the host (100).
Each of the VNICs (114A, 114B, 114C, 114D) is operatively connected to a corresponding virtual network stack (VNS) (116A, 116B, 116C, 116D). In one embodiment of the invention, each VNS (116A, 116B, 116C, 116D) includes functionality to process packets in accordance with various protocols used to send and receive packets (e.g., Transmission Communication Protocol (TCP), Internet Protocol (IP), User Datagram Protocol (UDP), etc.). Further, each VNS (116A, 116B, 116C, 116D) may also include functionality, as needed, to perform additional processing on the incoming and outgoing packets. This additional processing may include, but is not limited to, cryptographic processing, firewall routing, etc.
In one embodiment of the invention, each VNS (116A, 116B, 116C, 116D) includes network layer and transport layer functionality. In one embodiment of the invention, network layer functionality corresponds to functionality to manage packet addressing and delivery on a network (e.g., functionality to support IP, Address Resolution Protocol (ARP), Internet Control Message Protocol, etc.). In one embodiment of the invention, transport layer functionality corresponds to functionality to manage the transfer of packets on the network (e.g., functionality to support TCP, UDP, Stream Control Transmission Protocol (SCTP), etc.). The structure and functionality of the VNSs (116A, 116B, 116C, 116D) is discussed in
As discussed above, the host (100) includes one or more packet destinations (118). In one embodiment of the invention, the packet destination(s) (118)
correspond to any process (or group of processes) executing on the host that is configured to send and/or receive network traffic. Further, the packet destination(s) (118) does not include an internal network stack (i.e., there is no network stack within the packet destination(s)).
Examples of packet destinations (118) include, but are not limited to containers, services (e.g., web server), etc. As shown in
In one embodiment of the invention, each VNS (116A, 116B, 116C, 116D) is associated with a bandwidth allocation. Those skilled in the art will appreciate that if there is only one VNS (116A, 116B, 116C, 116D) bound to the packet destination (118), then the bandwidth allocation of the VNS (116A, 116B, 116C, 116D) corresponds to the bandwidth allocated to the packet destination (118). In one embodiment of the invention, the bandwidth allocation corresponds to the number of packets the packet destination (118) may receive in a given time interval (e.g., megabytes per seconds). The bandwidth allocation for a given packet destination (118) is enforced by the VNS (116A, 116B, 116C, 116D) operating in polling mode (discussed in
In one embodiment of the invention, the VNIC (114A, 114B, 114C, 114D) may be bound to a virtual machine (not shown) (e.g., Xen Domain) instead of a VNS (116A, 116B, 116C, 116D). In such cases, the VNIC (114A, 114B, 114C, 114D) is bound to an interface (e.g., a Xen interface), where the interface enables the VNIC (114A, 114B, 114C, 114D) to communicate to with the virtual machine. In one embodiment of the invention, the aforementioned virtual machine includes its own network stack and includes its own operating system (OS) instance, which may be different than the OS executing on the host.
Sub-flow table 1 (206) includes a number of sub-flow table entries (SFTEs) (216, 218, 220) (discussed below in
Sub-flow table 2 (208) includes a number of sub-flow table entries (SFTEs) (222, 224). Each of the SFTEs in sub-flow table 2 (208) is associated with a HRR. For example, SFTE 2 (222) is associated with HRR 5 (234) and SFTE 5 (224) is associated with SFTE 5 (236). Thus, if a packet matches SFTE 4 (222) (discussed below in
Those skilled in the art will appreciate that while the hardware classifier (202) shown in
In one embodiment of the invention, the hardware classifier (202) shown in
In one embodiment of the invention, the flow corresponds to a route a packet may take once it is received by the NIC until the packet reaches its destination. In one embodiment of the invention, the name (302) corresponds to an administrator-chose name used to identify a flow with which the FTE (300) is associated.
The flow description (304) corresponds to a criterion (or criteria), where a packet must satisfy the criterion (or criteria) to be associated with the flow. The criterion (or criteria) may correspond to a level two (L2) routing criterion, a level three (L3) routing criterion, a level four (L4) routing criterion, any other criterion (or criteria) that may be used to differentiate packets, or any combination thereof. Further, there may be multiple criteria from one or more routing levels. For example, the flow description (304) may include one L2 routing criterion and two L3 routing criteria.
In one embodiment of the invention, the L2 routing criterion may include, but is not limited to, a Media Access Control (MAC) address corresponding to the source of the packet, a MAC address corresponding to the destination of the packet, and a virtual local area network (VLAN) tag.
In one embodiment of the invention, the L3 routing criterion may include, but is not limited to, an Internet Protocol (IP) address corresponding to the source of the packet, an IP address corresponding to the destination of the packet, an IPv6 Flow ID, and a protocol number (i.e., the 8-bit number in the “Protocol” field of the IPv4 header or the “Next Header” field in the IPv6 header). In one embodiment of the invention, the L4 routing criterion may include, but is not limited to, a TCP port number.
In one embodiment of the invention, the packet matching function (306) corresponds to function that takes a packet header (or appropriate portion(s) thereof) and the flow description (304) as input and determines whether the corresponding packet (i.e., the packet associated with the aforementioned packet header) satisfies the criterion (or criteria) listed in the flow description.
For example, if the flow description required a destination IP address of 10.1.1.5 and the packet has a destination IP address of 10.1.2.3, then the packet matching function would return a result indicating that the packet did not match the criterion in the flow description.
In one embodiment of the invention, the target cookie (308) identifies a VNIC (e.g., 114A, 114B, 114C, 114D in
In one embodiment of the invention, the acceptor function (310) corresponds to a function that, when executed, sends the packet to a VNIC (i.e., the VNIC identified by the target cookie (308)). In one embodiment of the invention, the acceptor function (310) takes a packet and the target cookie (308) as input. Further, the acceptor function (310), during execution, sends the packet to the VNIC identified by the target cookie (308).
In one embodiment of the invention, if the FTE (300) points to a sub-flow table, then the acceptor function (310) and the target cookie (308) may not be present in the FTE. Rather, the FTE may include a pointer (or some other data structure) to direct the classifier to the sub-flow table associated with the FTE (300).
In one embodiment, the IP layer (402) is configured to receive packets from the VNIC associated with the VNS (404) (e.g., VNS D (116A) receives packets from VNIC D (14D) in
Continuing with the discussion of
In one embodiment of the invention, the transport layer (406) is configured to process inbound and outbound packets in accordance with Transmission Control Protocol (TCP), User Datagram Protocol (UDP), or both UDP and TCP.
In one embodiment of the invention, the outbound VSQ (408) is a queue data structure configured to receive packets from the packet destination (e.g., 118) with which the VNS (404) is associated. Further, the outbound VSQ (408) is configured to store packets prior to sending the received packets to the transport layer (406). In one embodiment of the invention, the outbound VSQ (408) is also configured to control the flow of packets from the packet destination (e.g., 118) associated with the VNS (400). In one embodiment of the invention, the outbound VSQ (408) (or a related process) is configured to block an application from sending packets to the outbound VSQ (408) if the packet destination (e.g., 118) is attempting to issue packets at a higher rate than the outbound bandwidth allocated to the packet destination (e.g., 118). Further, the outbound VSQ (408) (or a related process) is configured to notify the packet destination (e.g., 118) when it is no longer blocked from issuing packets to the VNS (400).
In one embodiment of the invention, the inbound VSQ (404) and outbound VSQ (408) are each configured to enforce the manner in which packets are processed. Specifically, the inbound VSQ (404) and outbound VSQ (408) may be configured to enforce the packet processing requirements imposed by the transport layer (406). For example, TCP requires serial processing of packets. Thus, the inbound VSQ (404) and outbound VSQ (408) may require all threads accessing the inbound VSQ (404) and outbound VSQ (408) to conform to a mutual exclusion policy. In one embodiment of the invention, the mutual exclusion policy requires that only one thread may access the VSQ at a time. Thus, if two threads are attempting to access a given VSQ, one thread must wait until the other thread has finished accessing the VSQ.
Alternatively, if the transport layer (406) only supports UDP, then the inbound VSQ (404) and outbound VSQ (408) may be configured to allow concurrent access. Said another way, two or more threads may concurrently access the VSQ. In one embodiment of the invention, if the transport layer (406) is configured to process both TCP and UDP packets, then the inbound VSQ (404) and outbound VSQ (408) are configured to conform to the more stringent standard (e.g., TCP if the transport layer supports both TCP and UDP).
In one embodiment of the invention, the inbound VSQ (404) and the outbound VSQ (408) are implemented as a single bi-directional VSQ. In such cases, the bi-directional VSQ includes a single set of configuration parameters (discussed above) to enforce the manner in which packets are processed. Further, the enforcement of the configuration parameters is performed on a VSQ-basis (as opposed to a per-direction basis).
Once the number of VNICs to be created has been determined, the number of hardware receive rings (HRRs) on the NIC is assessed (Step 305). VNICs are subsequently created in the host, where the number of VNICs created corresponds to the number of VNICs determined in Step 503 (Step 507). Next, a determination is made about whether there are fewer HRRs than VNICs on the host (Step 509). If there are fewer HRRs than VNICs on the host, then a software ring is created in the host and subsequently associated with one of the HRRs (Step 511).
A set of software receive rings (SRRs) is then created within the software ring (Step 513). The VNICs are then bound to the SRRs (Step 315). More specifically, the VNICs that cannot not be bound to the HRRs are bound to the SRRs. The remaining VNICs are bound to the HRRs (Step 317). Those skilled in the art will appreciate that steps in
In one embodiment of the invention, VNICs are preferably bound to an HRR if an HRR is available and the hardware classifier in the NIC is configured to perform the level of classification required by the host. In such cases, one HRR is bound to a software ring and the other HRRs are bound to VNICs. In one embodiment of the invention, each of the aforementioned VNICs is associated with a virtual network stack (VNS). Further, each VNS is associated with a bandwidth allocation.
As stated above, software rings can be arbitrarily created on top of HRR or software SRRs. As a result, different structures involving software rings can be created to handle the same number of VNICs using the method shown in
Continuing with the discussion in
If the receive ring is not associated with a software receive ring, then at this stage, the processing of the packets differs depending on mode in which the virtual serialization queue (VSQ) (which is bound to the HRR or connected to the SRR) is operating (Step 612). The aforementioned VSQ is associated with a VNS bound to a VNIC, where the VNIC is associated with the receive ring (HRR or SRR).
Continuing with the discussion of
Those skilled in the art will appreciate that the receive rings store a finite number of packets. Thus, if the receive rings receive packets at a faster rate than the rate at which the corresponding VSQ requests the packets, the receive rings will eventually fill completely with packets and packets received after this point are dropped until packets on the receive rings are requested and processed. In one embodiment of the invention, the rate at which packets are requested from the receive ring (SRR or HRR) and the number of packets requested is determined by the bandwidth allocation of the VNS bound to the receive ring.
Alternatively, if the VSQ is operating in interrupt mode, then an interrupt is issued to a processor (i.e., a processor bound to the VSQ that is bound to the VNS associated with the receive ring) (Step 614). In one embodiment of the invention, if the receive ring is an SRR and it is bound to a VNIC, then the interrupt (as recited in Step 614) is a software interrupt as opposed to a hardware interrupt (as recited in Step 614), which is generated when the HRR is bound to a VNIC. The packets are then sent to the VNIC (Step 616).
In one embodiment of the invention, if the VSQ is operating polling mode, then the VSQ, which includes a copy of the appropriate acceptor function, uses the acceptor function to obtain the packet from the receive ring and place it in the appropriate VNIC. Alternatively, if the VSQ is operating in polling mode, then the device driver (or NIC) executes the acceptor function to send the packet from the receive ring to the appropriate VNIC.
The VNIC subsequently forwards the packets to the appropriate VNS (Step 618), where the packets are processed and then sent to the packet destination (Step 620).
Once a packet is received, a determination is made about whether the classifier is using a direct matching function for the current level of classification (Step 700). In one embodiment of the invention, the direct matching function corresponds to a function such as a hashing function, where at most one flow table entry (FTE) (or sub-flow table entry SFTE) matches the result of the direct matching function. The direct matching function uses the packet header (or a portion thereof) as input to obtain the result of the direct matching function.
In one embodiment of the invention, the current level of classification corresponds to the flow table (or sub-flow table) the classifier is currently using to classify the packet. For example, returning to
Continuing with the discussion of
At Step 710, the acceptor function and target cookie are obtained from the FTE). The packet, target cookie, and acceptor function at then forwarded to the appropriate RR (HRR or SRR) (Step 712). At this stage, the packet is forwarded to the VNIC specified in the target cookie using either interrupt mode or polling mode as discussed above.
Returning to Step 700, if the classifier is not using a direct matching function for the current level of classification, then a FTE is obtained (Step 701), where the FTE is located in the flow table the classifier is currently using to classify the packets. For example, returning to
Continuing with the discussion of
In one embodiment of the invention, if the process enters Step 700 from Step 704, then all subsequent references to FTE corresponds to sub-flow table entry (SFTE).
An embodiment of the invention may be implemented on virtually any type of computer regardless of the platform being used. For example, as shown in
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this disclosure, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as disclosed herein. Accordingly, the scope of the invention should be limited only by the attached claims.
The present application contains subject matter that may be related to the subject matter in the following U.S. applications filed on Apr. 22, 2005, and assigned to the assignee of the present application: “Method and Apparatus for Managing and Accounting for Bandwidth Utilization Within A Computing System” with U.S. application Ser. No. 11/112,367 (Attorney Docket No. 03226/643001; SUN050681); “Method and Apparatus for Consolidating Available Computing Resources on Different Computing Devices” with U.S. application Ser. No. 11/112,368 (Attorney Docket No. 03226/644001; SUN050682); “Assigning Higher Priority to Transactions Based on Subscription Level” with U.S. application Ser. No. 11/112,947 (Attorney Docket No. 03226/645001; SUN050589); “Method and Apparatus for Dynamically Isolating Affected Services Under Denial of Service Attack” with U.S. application Ser. No. 11/112,158 (Attorney Docket No. 03226/646001; SUN050587); “Method and Apparatus for Improving User Experience for Legitimate Traffic of a Service Impacted by Denial of Service Attack” with U.S. application Ser. No. 11/112,629 (Attorney Docket No. 03226/647001; SUN050590); “Method and Apparatus for Limiting Denial of Service Attack by Limiting Traffic for Hosts” with U.S. application Ser. No. 11/112,328 (Attorney Docket No. 03226/648001; SUN050591); “Hardware-Based Network Interface Per-Ring Resource Accounting” with U.S. application Ser. No. 11/112,222 (Attorney Docket No. 03226/649001; SUN050593); “Dynamic Hardware Classification Engine Updating for a Network Interface” with U.S. application Ser. No. 11/112,934 (Attorney Docket No. 03226/650001; SUN050592); “Network Interface Card Resource Mapping to Virtual Network Interface Cards” with U.S. application Ser. No. 11/112,063 (Attorney Docket No. 03226/651001; SUN050588); “Network Interface Decryption and Classification Technique” with U.S. application Ser. No. 11/112,436 (Attorney Docket No. 03226/652001; SUN050596); “Method and Apparatus for Enforcing Resource Utilization of a Container” with U.S. application Ser. No. 11/112,910 (Attorney Docket No. 03226/653001; SUN050595); “Method and Apparatus for Enforcing Packet Destination Specific Priority Using Threads” with U.S. application Ser. No. 11/112,584 (Attorney Docket No. 03226/654001; SUN050597); “Method and Apparatus for Processing Network Traffic Associated with Specific Protocols” with U.S. application Ser. No. 11/112,228 (Attorney Docket No. 03226/655001; SUN050598). The present application contains subject matter that may be related to the subject matter in the following U.S. applications filed on Oct. 21, 2005, and assigned to the assignee of the present application: “Method and Apparatus for Defending Against Denial of Service Attacks” with U.S. application Ser. No. 11/255,366 (Attorney Docket No. 03226/688001; SUN050966); “Router Based Defense Against Denial of Service Attacks Using Dynamic Feedback from Attacked Host” with U.S. application Ser. No. 11/256,254 (Attorney Docket No. 03226/689001; SUN050969); and “Method and Apparatus for Monitoring Packets at High Data Rates” with U.S. application Ser. No. 11/226,790 (Attorney Docket No. 03226/690001; SUN050972). The present application contains subject matter that may be related to the subject matter in the following U.S. applications filed on Jun. 30, 2006, and assigned to the assignee of the present application: “Network Interface Card Virtualization Based On Hardware Resources and Software Rings” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/870001; SUN061020); “Method and System for Controlling Virtual Machine Bandwidth” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/871001; SUN061021); “Virtual Switch” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/873001; SUN061023); “System and Method for Virtual Network Interface Cards Based on Internet Protocol Addresses” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/874001; SUN061024); “Virtual Network Interface Card Loopback Fastpath” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/876001; SUN061027); “Bridging Network Components” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/877001; SUN061028); “Reflecting the Bandwidth Assigned to a Virtual Network Interface Card Through Its Link Speed” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/878001; SUN061029); “Method and Apparatus for Containing a Denial of Service Attack Using Hardware Resources on a Virtual Network Interface Card” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/879001; SUN061033); “Virtual Network Interface Cards with VLAN Functionality” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/882001; SUN061037); “Method and Apparatus for Dynamic Assignment of Network Interface Card Resources” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/883001; SUN061038); “Generalized Serialization Queue Framework for Protocol Processing” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/884001; SUN061039); “Serialization Queue Framework for Transmitting Packets” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/885001; SUN061040). The present application contains subject matter that may be related to the subject matter in the following U.S. applications filed on Jul. 20, 2006, and assigned to the assignee of the present application: “Low Impact Network Debugging” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/829001; SUN060545); “Reflecting Bandwidth and Priority in Network Attached Storage I/O” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/830001; SUN060587); “Priority and Bandwidth Specification at Mount Time of NAS Device Volume” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/831001; SUN060588); “Notifying Network Applications of Receive Overflow Conditions” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/869001; SUN060913); “Host Operating System Bypass for Packets Destined for a Virtual Machine” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/872001; SUN061022); “Method and System for Automatically Reflecting Hardware Resource Allocation Modifications” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/881001; SUN061036); “Multiple Virtual Network Stack Instances Using Virtual Network Interface Cards” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/888001; SUN061041); “Method and System for Network Configuration for Containers” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/889001; SUN061044); “Network Memory Pools for Packet Destinations and Virtual Machines” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/890001; SUN061062); “Method and System for Network Configuration for Virtual Machines” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/893001; SUN061171); “Multiple Virtual Network Stack Instances” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/896001; SUN061198); and “Shared and Separate Network Stack Instances” with U.S. Application Serial No. TBD (Attorney Docket No. 03226/898001; SUN061200).