A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention is generally related to computer systems and software such as middleware, and is particularly related to supporting a middleware machine environment.
The interconnection network plays a beneficial role in the next generation of super computers, clusters, and data centers. High performance network technology, such as the InfiniBand (IB) technology, is replacing proprietary or low-performance solutions in the high performance computing domain, where high bandwidth and low latency are the key requirements. For example, IB installations are used in supercomputers such as Los Alamos National Laboratory's Roadrunner, Texas Advanced Computing Center's Ranger, and Forschungszcntrum Juelich's JuRoPa.
IB was first standardized in October 2000 as a merge of two older technologies called Future I/O and Next Generation I/O. Due to its low latency, high bandwidth, and efficient utilization of host-side processing resources, it has been gaining acceptance within the High Performance Computing (HPC) community as a solution to build large and scalable computer clusters. The de facto system software for IB is OpenFabrics Enterprise Distribution (OFED), which is developed by dedicated professionals and maintained by the OpenFabrics Alliance. OFED is open source and is available for both GNU/Linux and Microsoft Windows.
Described herein are systems and methods for supporting packet direct forwarding in a middleware machine environment. The middleware machine environment comprises one or more external ports on at least one network switch instance, wherein each external port can receive one or more data packets from an external network. Furthermore, the middleware machine environment comprises a plurality of host channel adapter (HCA) ports on one or more host servers, wherein each said HCA port is associated with a said host server, and each said host server can support one or more virtual machines that operate to process the one or more data packets. The at least one network switch operate to send a packet received at an external port to a designated HCA port associated with the external port. An external switch in the external network can send the data packet to the particular external port based on a packet distribution algorithm.
Described herein is a system and method for providing a middleware machine or similar platform. In accordance with an embodiment of the invention, the system comprises a combination of high performance hardware, e.g. 64-bit processor technology, high performance large memory, and redundant InfiniBand and Ethernet networking, together with an application server or middleware environment, such as WebLogic Suite, to provide a complete Java EE application server complex which includes a massively parallel in-memory grid, that can be provisioned quickly, and can scale on demand. In accordance with an embodiment, the system can be deployed as a full, half, or quarter rack, or other configuration, that provides an application server grid, storage area network, and InfiniBand (IB) network. The middleware machine software can provide application server, middleware and other functionality such as, for example, WebLogic Server, JRockit or Hotspot JVM, Oracle Linux or Solaris, and Oracle VM. In accordance with an embodiment, the system can include a plurality of compute nodes, IB switch gateway, and storage nodes or units, communicating with one another via an IB network. When implemented as a rack configuration, unused portions of the rack can be left empty or occupied by fillers.
In accordance with an embodiment of the invention, referred to herein as “Sun Oracle Exalogic” or “Exalogic”, the system is an easy-to-deploy solution for hosting middleware or application server software, such as the Oracle Middleware SW suite, or Weblogic. As described herein, in accordance with an embodiment the system is a “grid in a box” that comprises one or more servers, storage units, an IB fabric for storage networking, and all the other components required to host a middleware application. Significant performance can be delivered for all types of middleware applications by leveraging a massively parallel grid architecture using, e.g. Real Application Clusters and Exalogic Open storage. The system delivers improved performance with linear I/O scalability, is simple to use and manage, and delivers mission-critical availability and reliability.
Additionally, the host servers provides a plurality of virtual interfaces, such as virtual network interface cards (vNICs) 121-128, for receiving data packets from the external network via the gateway instances A-B 102-103. The gateway instances 102-103 can define and maintain one or more virtual hubs (vHUBs) 111-113, each of which defines a logical layer 2 (L2) link on the IB fabric side that contains vNICs associated with the same gateway instance. Furthermore, the vNICs and the hosts that belong to the same vHUB can communicate with each other without involving the associated gateway instance.
As shown in
A vNIC in the IB fabric can be uniquely identified using a virtual Ethernet interface (VIF), which includes a combination of a VLAN ID and a MAC address. Also, when the VIFs are used concurrently in the same vHub in a gateway instance, different MAC addresses are used for the different VIFs. Additionally, the system can perform an address translation from an Ethernet layer 2 MAC address to an IB layer 2 address that uses local identifier (LID)/global identifier (GID) and queue pair number (QPN).
Furthermore, the gateway instance 201 can include a hardware vNIC context table 232, which contains various entries or hardware vNIC contexts. The hardware vNIC context table 232 can be stored in a memory of the gateway instance 201. When a host driver is sending packets to the external Ethernet via the IB fabric and the gateway 201, this hardware vNIC context table 232 can be used to verify that the correct source address information is used by the correct host. The hardware context table 232 can also be used to look up the correct host HCA port address on the IB fabric and QPN within that HCA, when packets are received by the gateway from the external Ethernet. Additionally, the hardware vNIC contexts can be used to directly steer packets for a specific logical vNIC to a dedicated receive queue in the designated host context/memory.
The gateway instance 201, which can be hardware itself or a software running on top of a hardware switch, allows the use of network managed vNIC allocation. The management interface 203 on the gateway instance 201, e.g. a NM2-GW service processor, can be used to allocate hardware vNIC contexts on behalf of specific host (HCA) ports.
A single vNIC in the IB fabric may or may not be allocated with a hardware vNIC context recorded in the hardware vNIC context table 232. In the example as shown in
A flooding mechanism can be used to scale the number of logical vNICs beyond the size of the gateway HW vNIC context table. Using the flood-based vNICs, the system allows the same amount of receive queues on the host(s) to receive packets for a large number of logical vNICs. Furthermore, using a flooding mechanism, the system allows schemes where hardware vNIC contexts can be established in the hardware context table 232 after initial packet traffic from the external Ethernet has been received.
The network switch 302 (or switches) can include the above one or more external ports 321-322, each of which can receive one or more data packets from the external network 304. Furthermore, the IB fabric 301 can include one or more host servers, e.g. host servers A-B 311-312, each of which can support one or more virtual machines for processing the received data packets. For example, host server A 311 supports VM A 341 and VM B 342, and host server B 312 supports VM C 343.
Additionally, the network switch 302 can maintain one or more virtual hubs, e.g. vHUB A 303 (with a unique VLAN ID). The vHUB A 303 can include various vNICs a-c 331-333, each of which is assigned with a MAC address a-c 351-353. Here, each MAC/VLAN ID combination represents a logical vNIC a-c 331-333 associated with a specific HCA port, e.g. HCA port A-C 361-363.
Furthermore, the external switch 310 in the external network 304 can direct a data packet to a particular external port based on a packet distribution algorithm 320. Then, the network switch 302 can send packets received at different external ports to different designated HCA ports. As shown in
In accordance with an embodiment of the invention, the external network 304, which communicate with the IB fabric 301, can be an Ethernet network, such as a 10G Ethernet network. Additionally, the network switch 302 can forward an incoming data packet, e.g. received at an external port 321, based on an evaluation of virtual machine specific quality of service/service level agreement (QoS/SLA).
The network switch 402 (or switches) can include one or more external ports 406a-h, each of which can receive one or more data packets from the external network 404 and be associated with a different designated HCA port 407a-h. Furthermore, the network switch 402 allows the external switch 410 in the external network 404 to send a data packet to a particular external port 406a-h on the network switch 401, e.g. based on a packet distribution algorithm 420. Additionally, the allocation of destination address, such as MAC and IP addresses, for the virtual machines on various servers (e.g. host server A-H 411-418) can correspond to, or be matched with, the packet distribution algorithm 420 of the external switch 410.
Furthermore, an incoming data packet received at a host server, e.g. host server A 411 associated with the designated HCA port 407a for the external port 406a, may be sent to another host server, e.g. host server C 413. Then, the virtual machines on host server C 413 can process the packet.
Additionally, a constant stream of data packets can be sent to each external port 406a-h on the network switch 401. The incoming data packets can be flood based, or more specifically be based on direct forwarding, when there is no hardware context available in the hardware vNIC context table 409. The hardware context table 409, which contains a plurality of hardware context entries, can be used to forward an incoming data packet with hardware context to a target HCA port 407a-h, when it is appropriate. Here, the hardware context entries in the hardware vNIC context table 409 can be used to look up the correct host HCA port address on the IB fabric 401 and QPN within that HCA, when packets are received from the external network 404.
The present invention may be conveniently implemented using one or more conventional general purpose or specialized digital computer, computing device, machine, or microprocessor, including one or more processors, memory and/or computer readable storage media programmed according to the teachings of the present disclosure. Appropriate software coding can readily be prepared by skilled programmers based on the teachings of the present disclosure, as will be apparent to those skilled in the software art.
In some embodiments, the present invention includes a computer program product which is a storage medium or computer readable medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the present invention. The storage medium can include, but is not limited to, any type of disk including floppy disks, optical discs, DVD, CD-ROMs, microdrive, and magneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flash memory devices, magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
The foregoing description of the present invention has been provided for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations will be apparent to the practitioner skilled in the art. The embodiments were chosen and described in order to best explain the principles of the invention and its practical application, thereby enabling others skilled in the art to understand the invention for various embodiments and with various modifications that are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the following claims and their equivalence.
This application claims priority to U.S. Provisional Patent Application No. 61/506,557, entitled “SYSTEM AND METHOD FOR USING UNICAST AND MULTICAST FLOODING MECHANISMS TO PROVIDE EoIB GATEWAY vNICs” filed Jul. 11, 2011, which application is herein incorporated by reference. The application is related to the following patent applications, which are hereby incorporated by reference in its entirety: U.S. patent application Ser. No. 13/546,217, entitled “SYSTEM AND METHOD FOR USING A MULTICAST GROUP TO SUPPORT A FLOODING MECHANISM IN A MIDDLEWARE MACHINE ENVIRONMENT”, filed Jul. 11, 2012, which is now U.S. Pat. No. 9,054,886, issued on Jun. 9, 2015. U.S. patent application Ser. No. 13/546,236, entitled “SYSTEM AND METHOD FOR USING A PACKET PROCESS PROXY TO SUPPORT A FLOODING MECHANISM IN A MIDDLEWARE MACHINE ENVIRONMENT”, filed Jul. 11, 2012, U.S. patent application Ser. No. 13/546,261, entitled “SYSTEM AND METHOD FOR SUPPORTING A SCALABLE FLOODING MECHANISM IN A MIDDLEWARE MACHINE ENVIRONMENT”, filed Jul. 11, 2012, and U.S. patent application Ser. No. 13/546,405, entitled “SYSTEM AND METHOD FOR SUPPORTING A VIRTUAL MACHINE MIGRATION IN A MIDDLEWARE MACHINE ENVIRONMENT”, filed Jul. 11, 2012, which is now U.S. Pat. No. 8,874,742, issued on Oct. 28, 2014.
Number | Name | Date | Kind |
---|---|---|---|
5758083 | Singh | May 1998 | A |
6038233 | Hamamoto | Mar 2000 | A |
6098098 | Sandahl | Aug 2000 | A |
6148336 | Thomas | Nov 2000 | A |
6282678 | Snay | Aug 2001 | B1 |
6308148 | Bruins | Oct 2001 | B1 |
6314531 | Kram | Nov 2001 | B1 |
6343320 | Fairchild | Jan 2002 | B1 |
6658016 | Dai | Dec 2003 | B1 |
6658579 | Bell | Dec 2003 | B1 |
6697360 | Gai | Feb 2004 | B1 |
6826694 | Dutta | Nov 2004 | B1 |
6941350 | Frazier | Sep 2005 | B1 |
6981025 | Frazier et al. | Dec 2005 | B1 |
7023795 | Hwu | Apr 2006 | B1 |
7113995 | Beukema et al. | Sep 2006 | B1 |
7290277 | Chou et al. | Oct 2007 | B1 |
7398394 | Johnsen | Jul 2008 | B1 |
7409432 | Recio | Aug 2008 | B1 |
7636772 | Kirby | Dec 2009 | B1 |
7721324 | Jackson | May 2010 | B1 |
7843906 | Chidambaram | Nov 2010 | B1 |
7860006 | Kashyap | Dec 2010 | B1 |
7894440 | Xu | Feb 2011 | B2 |
8391289 | Yalagandula | Mar 2013 | B1 |
8645524 | Pearson | Feb 2014 | B2 |
20020016858 | Sawada | Feb 2002 | A1 |
20020133620 | Krause | Sep 2002 | A1 |
20030005039 | Craddock | Jan 2003 | A1 |
20040013088 | Gregg | Jan 2004 | A1 |
20040028047 | Hou | Feb 2004 | A1 |
20040037279 | Zelig et al. | Feb 2004 | A1 |
20040078709 | Beukema | Apr 2004 | A1 |
20040123142 | Dubal et al. | Jun 2004 | A1 |
20040168089 | Lee | Aug 2004 | A1 |
20040199764 | Koechling | Oct 2004 | A1 |
20050071709 | Rosenstock | Mar 2005 | A1 |
20050100033 | Arndt | May 2005 | A1 |
20060230219 | Njoku | Oct 2006 | A1 |
20060248200 | Stanev | Nov 2006 | A1 |
20070022479 | Sikdar et al. | Jan 2007 | A1 |
20070036178 | Hares | Feb 2007 | A1 |
20070038703 | Tendjoukian | Feb 2007 | A1 |
20070073882 | Brown | Mar 2007 | A1 |
20070140266 | Njoku | Jun 2007 | A1 |
20070280104 | Miyoshi | Dec 2007 | A1 |
20080137528 | O'Toole | Jun 2008 | A1 |
20080159277 | Vobbilisetty | Jul 2008 | A1 |
20080163124 | Bonev | Jul 2008 | A1 |
20080267183 | Arndt | Oct 2008 | A1 |
20080301256 | McWilliams | Dec 2008 | A1 |
20090003317 | Kasralikar | Jan 2009 | A1 |
20090019505 | Gopalakrishnan | Jan 2009 | A1 |
20090070448 | Pearson | Mar 2009 | A1 |
20090073895 | Morgan | Mar 2009 | A1 |
20090080328 | Hu | Mar 2009 | A1 |
20090222558 | Xu et al. | Sep 2009 | A1 |
20090234974 | Arndt | Sep 2009 | A1 |
20090262741 | Jungck | Oct 2009 | A1 |
20100008291 | LeBlanc | Jan 2010 | A1 |
20100054129 | Kuik | Mar 2010 | A1 |
20100103837 | Jungck | Apr 2010 | A1 |
20100107162 | Edwards | Apr 2010 | A1 |
20100118868 | Dabagh | May 2010 | A1 |
20100138532 | Glaeser | Jun 2010 | A1 |
20100257269 | Clark | Oct 2010 | A1 |
20100275199 | Smith et al. | Oct 2010 | A1 |
20100287548 | Zhou et al. | Nov 2010 | A1 |
20100306408 | Greenberg | Dec 2010 | A1 |
20100325257 | Goel | Dec 2010 | A1 |
20110023108 | Geldermann et al. | Jan 2011 | A1 |
20110131447 | Prakash et al. | Jun 2011 | A1 |
20110239268 | Sharp | Sep 2011 | A1 |
20110246669 | Kanada et al. | Oct 2011 | A1 |
20110268117 | Davis | Nov 2011 | A1 |
20110299537 | Saraiya et al. | Dec 2011 | A1 |
20120103837 | Wall | May 2012 | A1 |
20120131225 | Chiueh et al. | May 2012 | A1 |
20120147894 | Mulligan | Jun 2012 | A1 |
20120173757 | Sanden | Jul 2012 | A1 |
20120265976 | Spiers | Oct 2012 | A1 |
20120278804 | Narayanasamy | Nov 2012 | A1 |
20120291028 | Kidambi | Nov 2012 | A1 |
20120307826 | Matsuoka | Dec 2012 | A1 |
20120314706 | Liss | Dec 2012 | A1 |
20120320929 | Subramanian | Dec 2012 | A9 |
20120331127 | Wang | Dec 2012 | A1 |
20120331142 | Mittal et al. | Dec 2012 | A1 |
20130016718 | Johnsen | Jan 2013 | A1 |
20130036136 | Horii | Feb 2013 | A1 |
20130077492 | Scaglione | Mar 2013 | A1 |
20130232492 | Wang | Sep 2013 | A1 |
20140115584 | Mudigonda | Apr 2014 | A1 |
20140223431 | Yoshimura | Aug 2014 | A1 |
Number | Date | Country |
---|---|---|
1514625 | Jul 2004 | CN |
101123498 | Feb 2008 | CN |
1 128 607 | Aug 2001 | EP |
2160068 | Mar 2010 | EP |
2012037518 | Mar 2012 | WO |
Entry |
---|
Lee, M. et al., “Security Enhancement in Infiniband Architecture,” Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, Denver, Colorado, Apr. 4-8, 2005, Piscataway, New Jersey, Apr. 4, 2005, 18 pages. |
Sun Infiniband Dual Port 4x QDR PCIe ExpressModule and Low Profile Host Channel Adapters M2, Frequently Asked Questions, Sep. 21, 2010, http://www.oracle.com/us/products/servers-storage/networking/infiniband/sun-qdr-ib-hcas-faq-172841.pdf, retrieved on Sep. 11, 2012, 4 pages. |
International Search Report dated Sep. 26, 2013 for Application No. PCT/US2013/040656, 10 pages. |
European Patent Office International Searching Authority, International Search Report and Written Opinion for PCT International Application No. PCT/US2012/046225, Oct. 11, 2012, 10 pages. |
European Patent Office International Searching Authority, International Search Report and Written Opinion for PCT International Application No. PCT/US2012/046219, Mar. 1, 2013, 17 pages. |
State Intellectual Property Office of the People's Republic of China, Search Report for Chinese Patent Application No. 201280030334.2, Office Action dated Aug. 21, 2015, 2 pages. |
State Intellectual Property Office of the People's Republic of China, Search Report for Chinese Patent Application No. 201280027279.1, Office Action dated Sep. 9, 2015, 2 pages. |
Number | Date | Country | |
---|---|---|---|
20130016731 A1 | Jan 2013 | US |
Number | Date | Country | |
---|---|---|---|
61506557 | Jul 2011 | US |