The present application claims priority from Japanese Patent Application No. 2013-230052 filed on Nov. 6, 2013, the content of which is hereby incorporated by reference into this application.
The present invention relates to a network system and a network relay device, for example, a network system and a network relay device effectively applied to transmission and reception of a frame between virtual machines (VM) in the virtual extensible local area network (VXLAN).
As a network technology, VXLAN adaptable to the increase of tenants and applications has been known.
Also, as described in Japanese Patent Application Laid-Open Publication No. 2012-249050 (Patent Document 1), a network system with a two-stage configuration made up of a lower layer and an upper layer has been known.
In the network system made up of a lower layer and an upper layer, the lower layer is composed of a plurality of lower switches to which terminal devices are connected. Also, the upper layer is composed of a plurality of upper switches connected to each of the plurality of lower switches. In this case, transmission and reception of a frame between terminal devices connected to different lower switches are performed through the upper switch.
Patent Document 1 describes that a link aggregation group is set to a plurality of ports connected to different upper switches in a lower switch. In this case, when the lower switch transmits a received frame to an upper switch, the lower switch transmits the frame so as to be distributed to a plurality of ports belonging to the link aggregation group. In this manner, the transmission and reception of the frame between the terminal devices connected to different lower switches are performed in a distributed manner among the plurality of upper switches, and the wider bandwidth can be achieved.
Meanwhile, in VXLAN, a broadcast frame or a multicast frame from a VM is transmitted by using an IP multicast. For the management of join and leave to and from the multicast group in each VXLAN segment, IGMP (Internet Group Management Protocol) is used. As IGMP frames, there are three kinds of message-type frames. More specifically, there are frames of a membership query, a membership report and a leave group. Also, as a protocol for managing the multicast group, there is MLD (Multicast Listener Discovery). In the present specification, a frame of the protocol which manages the multicast group such as IGMP frame and MLD frame is referred to as a multicast management frame.
When VXLAN is applied to the network system disclosed in Patent Document 1, the multicast management frame used in VXLAN is transmitted so as to be distributed to a plurality of ports belonging to a link aggregation group. In this case, there exists an upper switch to which the multicast management frame is not transferred. In the upper switch to which the multicast management frame is not transferred, a source MAC (Media Access Control) address contained in the frame cannot be registered. Due to this, flooding occurs, and it causes the problem of the shortage in the bandwidth of the network system.
Patent Document 1 does not take account of the transmission of the multicast management frame.
An object of the present invention is to provide a network system and a network relay device capable of suppressing the occurrence of the flooding.
The above and other objects and novel characteristics of the present invention will be apparent from the description of the present specification and the accompanying drawings.
The following is a brief description of an outline of the typical embodiment of the invention disclosed in the present application.
A network system according to this embodiment includes: a plurality of fabric switches; and a plurality of port switches connected to the plurality of fabric switches through a plurality of links. The port switch sets a link aggregation group to the plurality of links connected to the plurality of fabric switches, and when the port switch receives a multicast management frame, it transfers the frame to each of the plurality of fabric switches. The fabric switch has a table in which information of a port which has received a frame and a source MAC address of a received frame are registered, and when the fabric switch receives the multicast management frame, it registers information of a port which has received the frame and a source MAC address of the frame in the table.
Also, a network relay device of this embodiment is connected to a plurality of other network relay devices through a plurality of links, sets a link aggregation group to the plurality of links connected to the plurality of other network relay devices, and when receiving a multicast management frame, transfers the frame to the plurality of other network relay devices.
According to one embodiment, it is possible to provide a network system and a network relay device capable of suppressing the occurrence of the flooding.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. Note that components having the same function are denoted by the same reference characters throughout the drawings for describing the embodiments, and the repetitive description thereof will be omitted.
In
The terminal unit 14 has an IP (Internet Protocol) address and a MAC address by which the terminal unit 14 itself is specified, and virtual MAC addresses for specifying the virtual machines VM[1,1] to VM[2,n] are given to the respective virtual machines VM[1,1] to VM[2,n].
In the following description, the case where a frame is transmitted from the virtual machine VM[1,1] in the information processing device 13a to the virtual machine VM [2,1] in the information processing device 13m is taken as an example. A frame (MAC frame) whose destination is a virtual MAC address of VM[2,1] and whose source is a virtual MAC address of VM [1,1] is transmitted from the virtual machine VM[1,1] to the terminal unit 14. The terminal unit 14 in the information processing device 13a adds a header compliant with the standard of VXLAN to the frame to encapsulate it. Concretely, the terminal unit 14 in the information processing device 13a adds a header having the IP address and the MAC address of the terminal unit 14 in the information processing device 13a as information of the source and having the IP address and the MAC address of the terminal unit 14 in the information processing device 13m as information of the destination to the frame to form an encapsulated frame, and transmits it to the box-type fabric system 12a.
The box-type fabric system 12a transmits the encapsulated frame to the information processing device 13m based on the MAC address of the terminal unit 14 in the information processing device 13m to be a destination contained in the received encapsulated frame.
The terminal unit 14 of the information processing device 13m deletes the header compliant with the standard of VXLAN from the received encapsulated frame. Then, the terminal unit 14 of the information processing device 13m transmits the frame to the virtual machine VM [2,1] based on the virtual MAC address of the virtual machine VM[2,1] contained as the destination MAC address in the frame.
Thereafter, for example, when the frame is transmitted from the virtual machine VM[2,1] to the virtual machine VM[1,1], the process reverse to that described above is performed. In this case, the frame is previously transmitted from the virtual machine VM[1,1] to the virtual machine VM[2,1] through the box-type fabric system 12a. Therefore, the box-type fabric system 12a has learned the MAC address of the terminal unit 14 in the information processing device 13a. Therefore, the occurrence of flooding can be suppressed and the information processing device 13a can be specified from the received encapsulated frame.
The frame transmission between specific information processing devices 13a and 13m has been described. In other cases, for example, a frame is transmitted from the virtual machine VM[1,1] of the information processing device 13a to virtual machines in a plurality of information processing devices 13b to 13m belonging to the same VXLAN segment. In this case, the virtual machine VM [1,1] in the information processing device 13a transmits the frame by broadcast or multicast. The broadcast or multicast frame transmitted by the virtual machine VM[1,1] is transferred by IP multicast between the terminal units 14 of the information processing devices 13a to 13m. As a join and leave protocol to and from a multicast group for each VXLAN, IGMP is used.
As a system which relays the transmission and reception of a frame, a so-called chassis-type switching device has been known. In the embodiment described in this specification, a system which relays a frame by combining a plurality of box-type switching devices is used instead of the chassis-type switching device. The relay system configured by combining a plurality of box-type switching devices is referred to as a box-type fabric system in this specification.
The box-type fabric system 12a shown as a representative example includes a plurality of box-type switching devices (hereinafter, referred to as port switch) 21a, 21b, . . . , and 21m and a plurality of box-type switching devices (hereinafter, referred to as fabric switch) 20a, 20b, . . . , and 20m functioning to relay a frame between the port switches 21a, 21b, . . . , and 21m.
The port switch 21a has a plurality of ports, and some of the ports are set as user ports Pu[1], . . . , and Pu[k] which are connected to the information processing devices 13a, . . . , and 13d serving as users and perform transmission and reception of a frame to and from the corresponding information processing devices. Also, other ports of the plurality of ports are set as link ports Pf[1], Pf[2], . . . , and Pf[m] connected to the plurality of fabric switches 20a, 20b, . . . , and 20m through the plurality of links 23a, 23b and 23c. In this embodiment, the link aggregation group (hereinafter, referred to as LAG) 22a is set to the plurality of links 23a, 23b and 23c mentioned above.
When the port switch 21a relays a frame received at specific user ports Pu[1], . . . and Pu[k] to another port switch, the port switch 21a relays the received frame so as to be properly distributed within the plurality of links 23a, 23b and 23c set as the LAG. Though described later with reference to
Each of the port switches 21b to 21m has the same configuration as that of the port switch 21a. More specifically, each of the port switches 21b to 21m also has a plurality of ports, some of the plurality of ports are set as a plurality of user ports Pu[1] to Pu[k], and the information processing devices 13e to 13m are connected to the respective user ports Pu[1] to Pu[k]. Also, link ports Pf[1] to Pf[m] connected to the fabric switches 20a to 20m through the plurality of links 24a to 24c and 25a to 25c are set to the respective port switches 21b to 21m. LAG 22b is set to the plurality of links 24a to 24c, and LAG 22c is set to the plurality of links 25a to 25c.
In this manner, like the port switch 21a, the frame received at the user ports of the port switches 21b to 21m is relayed so as to be properly distributed within the links 24a to 24c and the links 25a to 25c in the LAG 22b and the LAG 22c. Since the frame is relayed so as to be properly distributed within the LAG, the port switch and the fabric switch are physically connected in a one-to-multiple manner, and the frame is logically transferred in a one-to-one manner.
The fabric switch 20a has link ports Pp[1], Pp[2] . . . , and Pp[n] connected to the link ports Pf[1] of the port switches 21a, 21b, . . . , and 21m through the links 23a, 24a, . . . , and 25a. The fabric switch 20a relays the frame received at a specific link port to another specific link port. For example, it relays the frame received at the link port Pp[1] through the link 23a to the link port Pp[2].
In this manner, the frame received at the link port Pp[1] of the fabric switch through the link 23a is relayed to the link port Pf[1] of the port switch 21b. At this time, for example, the port switch 21b transfers the frame received at the link port Pf[1] to the user port Pu[1]. Consequently, the frame transmitted from the information processing device 13a is relayed to the information processing device 13e. Since the fabric switches 20b to 20m have the same configuration as that of the fabric switch 20a, the description thereof will be omitted.
At the time of relaying the frame, the source of the frame is specified by the source IP address and MAC address contained in the frame. Also, the destination of the frame is specified by the destination IP address and MAC address contained in the frame. In this embodiment, the IP address and the MAC address are the IP address and the MAC address of the terminal unit 14 provided in each of the information processing devices 13a to 13m as described above.
In
More specifically, as shown in
Similarly, the link port Pf[2] (see
The description returns to
Next, one example of the distribution rule in the LAG will be described. In this embodiment, a distribution identifier for the distribution is obtained by calculation based on the information of source and the information of destination of the frame. As the information of source, the source IP address contained in the transmitted frame is used, and as the information of destination, the destination IP address contained in the frame is used. The exclusive OR between the source IP address and the destination IP address is calculated by binary number representation, decimal representation of the calculated value obtained by the calculation is divided by a predetermined value (for example, 32), and a remainder value of the division serves as the distribution identifier for the distribution.
Hence, the remainder value falls within the range from 0 to 31 depending on the source IP address and the destination IP address. The remainder values 0 to 31 are assigned to the link port numbers. In the example shown in
When the port switch 21a receives a frame at the user ports Pu[1] to Pu[k] and transmits it to the fabric switch, it performs the above-described operation by using the source IP address and the destination IP address contained in the frame, thereby obtaining the remainder value corresponding to the distribution identifier. Then, it obtains the link port number corresponding to the obtained remainder value from the table shown in
In
At the time of relaying the frame as described above, the frame processing unit 43 performs the process with reference to the information stored in the table unit 40. In the LAG table 41 provided in the table unit 40, the table shown in
When a frame is received at the user port Pu[1] or Pu[k] and transferred to the fabric switch, the hash calculating unit 45 included in the frame processing unit 43 performs the operation described with reference to
In this embodiment, the IGMP information table 42 is provided in the table unit 40. A message type contained in the IGMP frame is stored in the IGMP information table 42. As the message type stored therein, there are a host membership report, a host membership query and a leave group as described above.
First, in the box-type fabric system 12a, the frame relay process is started in the step S50. A frame transmitted from any one of the information processing devices 13a to 13e is received at any one of the user ports Pu[1] to Pu[k].
The port switch 21a recognizes which of the user ports Pu[1] to Pu [k] receives the frame in the step S51. When the frame is received at any one of the user ports Pu[1] to Pu[k], it is determined whether or not the received frame is an IGMP frame in the step S52. The message type contained in the IGMP frame is preliminarily stored in the IGMP information table 42 provided in the port switch 21a. The frame processing unit 43 compares the message type stored in the IGMP information table 42 and the message type stored in the region 49 of the received frame in the step S52.
When the received frame does not contain the message type stored in the IGMP information table 42, it is determined that the frame is not an IGMP frame, and the step S54 is next executed. Meanwhile, when the received frame contains the message type stored in the IGMP information table 42, it is determined that the frame is an IGMP frame, and the step S53 is next executed.
In the step S54, it is determined whether or not the destination MAC address contained in the received frame has been registered in the FDB table 50 (determination by FDB). When it matches the MAC address registered in the FDB table 50, that is, it hits and the matched MAC address corresponds to the LAG number, the step S55 is next executed as an upper hit. Meanwhile, when it hits and the matched MAC address corresponds to the port number other than the LAG number, the step S57 is next executed.
In the step S57, the received frame is transferred to the link port having the port number obtained from the FDB table 50. Thereafter, the step S59 is executed.
Meanwhile, when the upper hit is determined in the step S54, the source IP address and the destination IP address contained in the received frame are transferred to the hash calculating unit 45, and the operation described with reference to
If the MAC address contained in the received frame does not match the MAC address registered in the FDB table in the step S54, that is, it mishits, the flooding is performed (step S58). In the step S58, the received frame is transferred to all of the ports other than the port which has received the frame. At this time, the step S55 and the step S56 are executed to the port to which the LAG number is set and the port corresponding to the distribution identifier is selected, so that the received frame is transferred to the selected port.
In this manner, the received frame hits when the destination MAC address contained therein has been registered in the FDB table 50, and it is transferred to the link port having the port number registered in the FDB table 50 or the link port in the LAG number. On the other hand, the received frame mishits when the destination MAC address contained therein is not registered in the FDB table 50, and the flooding is performed (step S58).
Meanwhile, when the received frame is the IGMP frame, the frame processing unit 43 transfers the received frame to all of the link ports Pf[1], Pf[2], . . . , and Pf[m] in the step S53. More specifically, in this case, the received frame is transferred to all of the fabric switches connected to the port switch 21a. At this time, for example, the LAG distribution control unit 44 stops its operation. In the step S53, after the received frame is transferred to all of the fabric switches, the frame relay process ends in the step S59.
In the transfer of the IGMP frame to all of the fabric switches, when one link corresponds to a plurality of physical link ports as shown in
As described above, when the received frame is an IGMP frame, the received IGMP frame is transferred from the port switch which has received the frame to all of the fabric switches connected thereto. Consequently, the source MAC address and the destination MAC address contained in the received frame are transferred to all of the fabric switches.
Next, the configuration of the fabric switch will be described with reference to
Next, the configuration of the fabric switch will be described with reference to
In
The frame processing unit 61 is connected to the link ports Pp[1], Pp[2], . . . , and Pp[n] and the FDB table 60, and when a frame is received at any one of the link ports Pp[1], Pp[2], . . . , and Pp[n], the frame processing unit 61 relays the frame to the link ports other than the link port which has received the frame based on the control of the control unit 62. At this time, the frame processing unit 61 transfers the source MAC address and the number of the link port which has received the frame contained in the received frame to the FDB table 60. The transferred source MAC address and the number of the link port which has received the frame containing it are registered as a pair in the FDB table 60.
For example, when the source MAC address of the received frame is Abc and the link port which has received the frame is the link port Pp[1], as shown in
For example, the information of the pair of the MAC address and the port number stored in the FDB table 60 is used when relaying a new frame. More specifically, when a new frame is received, if the source MAC address corresponding to a destination MAC address contained in the frame is registered in the FDB table 60, the number of the link port to which the received new frame is to be transferred can be obtained by retrieving the FDB table 60. In this manner, the frame can be efficiently relayed. In other words, the MAC address and the port number corresponding thereto are learned by receiving the frame. When a frame is received, if the MAC address corresponding to the destination MAC address contained in the frame is not registered in the FDB table 60, the flooding is likely to be performed.
As described above, in this embodiment, when a port switch receives an IGMP frame, the port switch relays the IGMP frame to all of the fabric switches connected to the port switch, and the source MAC address of the IGMP frame and the number of the port which has received the frame are registered in the FDB table 60 in all of the fabric switches. Consequently, the occurrence of the flooding caused due to that the MAC address is not registered in the FDB table 60 can be suppressed, and the efficient network relay device and network system can be provided.
Next, the operation relating to this embodiment will be described with reference to
In
In
First, when the port switch 21a (first port switch) receives an IGMP frame at the user port Pu[1], it transfers the frame to all of the fabric switches as described above. In
The fabric switch 20a transfers the IGMP frame from the link port Pp[2] (fourth port) to the link port Pf[1] (third port) of the port switch 21b (second port switch). At this time, the fabric switch 20b also transfers the received frame from the link port Pp[2] (fourth port) to the link port Pf[2] (third port) of the port switch 21b. The port switch 21b (second port switch) receives the frame at each of the link ports Pf[1] and Pf[2] (third port). In this case, the port switch 21b processes the IGMP frame received at the link port having a smaller number (specific port) as a valid frame, and transfers it to the information processing device 13e. On the other hand, for example, the IGMP frame received at the other link port is discarded (× mark).
Concretely, this can be achieved by determining whether or not the frame received at the link ports Pf[1] and Pf[2] is the IGMP frame by using the message type of IGMP stored in the IGMP information table 42 provided in the port switch 21b, and then validating the frame received at a specific link port in accordance with the result of the determination. More specifically, when the received frame is the IGMP frame, the link port Pf[1] having a smaller number is set as a specific link port, and the frame received at that link port is validated and the frame received at the other link port Pf[2] is discarded.
As described above, in the case where the MAC address of the information processing device 13a is registered in each of the fabric switches 20a and 20b, when the frame is transmitted from the information processing device 13m to the information processing device 13a, the fabric switch 20b can relay the frame as shown by the one-dot dashed lines by using the registered MAC address of the information processing device 13a. More specifically, the fabric switch 20b can detect the matching between the destination MAC address of the newly relayed frame and the MAC address of the information processing device 13a registered in the FDB table 60 and transmit the new frame from the link port Pp[1] corresponding to the MAC address of the information processing device 13a.
In
As a network protocol, there exists UDP (User Datagram Protocol). In the frame transfer using UDP, the frame transfer in one direction is performed in some cases. For example, in
In
Also, the method of calculating the distribution identifier is not limited to that described in this embodiment. For example, it is also possible to obtain the distribution identifier by the operation using either the information of source or the information of destination.
In this embodiment, the case where the IGMP frame is used as the multicast management frame has been described, but the multicast management frame is not limited to the IGMP frame. As the multicast management frame, for example, there is a frame of a protocol such as MLD. In this case, the MLD frame is determined as a multicast management frame in a port switch, and the MLD frame is transferred to all of the fabric switches connected to the port switch.
The fabric switches 20a to 20m and the port switches 21a to 21m shown in
In the foregoing, the invention made by the inventor of the present invention has been concretely described based on the embodiment. However, the present invention is not limited to the foregoing embodiment and various modifications and alterations can be made within the scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
2013-230052 | Nov 2013 | JP | national |