1. Field of the Invention
The present invention relates generally to storage area networks.
2. Description of the Related Art
Storage area networks (SANs) are becoming extremely large. Some of the drivers behind this increase in size include server virtualization and mobility. With the advent of virtualized machines (VMs), the number of connected virtual host devices has increased dramatically, to the point of reaching scaling limits of the SAN. In a Fibre Channel fabric one factor in limiting the scale of the fabric is the least capable or powerful switch in the fabric. This is because of the distributed services that exist in a Fibre Channel network, such as the name server, zoning and routing capabilities. In a Fibre Channel network each switch knows all of the connected node devices and computes routes between all of the node devices. Because of the information maintained in the name server for each of the node devices and the time required to compute the very large routing database, in many cases a small or less powerful switch limits the size of the fabric. It would be desirable to alleviate many of the conditions that cause this smallest or least powerful switch to be a limiting factor to allow larger fabrics to be developed.
In a Fibre Channel fabric and its included switches according to the present invention, the scale of the fabric has been decoupled from the scale capabilities of each switch. A first change is that only the directly attached node devices are included in the name server database of a particular switch. A second change that is made is that only needed connections, such as those from hosts to disks, i.e., initiators to targets, are generally maintained in the routing database. To assist in this development of limited routes, when a switch is initially connected to the network it is configured as either a server switch, a storage switch or a core switch, as this affects the routing entries that are necessary. This configuration further addresses the various change notifications that must be provided from the switch. For example, a server switch only provides local device state updates to storage switches that are connected to a zoned, online storage device. A storage switch, however, provides local device state updates to all server switches as a means of keeping the server switches aware of the presence of the storage devices.
In certain cases, such as host to host communications, such as a vMotion or transfer of a virtual machine between servers, disk to tape device communications in a backup, or disk to disk communications in a data migration, there must be transfers between like type devices, i.e. between two communications devices connected to server switches or connected to storage switches. These cases are preferably developed based on the zoning information.
By reducing the number of name server entries and the number of routing entries, the capabilities of each particular switch are dissociated from the scale of the fabric and the number of attached nodes. The scalability limits now are more directly addressed on a per server switch or per storage switch limit rather than a fabric limit. This in turn allows greater scalability of the fabric as a whole by increasing the scalability of the individual switches and allowing the fabric scale to be based on the sum of the switch limits rather than the limits of the weakest or least capable switch.
The present invention has other advantages and features which will be more readily apparent from the following detailed description of the invention and the appended claims, when taken in conjunction with the accompanying drawings, in which:
Referring now to
Also shown on
Referring to
As can be seen from these simplistic entries, each switch includes many different name server entries, one for each attached node device, even though the vast majority of the nodes are not connected to that particular switch. Similarly, in route table entries there are numerous route entries for paths that will never be utilized, such as for switch 102A the various entries between the various hosts 104A-C.
In normal operation in a conventional SAN, hosts or servers only communicate with disks or targets and do not communicate with other servers or hosts. Therefore the inclusion of all of those server to server entries in the route database and the time taken to compute those entries is unnecessary and thus burdensome to the processor in the switch. Similarly, all of the unneeded name server database entries and their upkeep is burdensome on the switch processor.
Referring now to
Device state updates, using SW_RSCNs (switch registered state change notifications) for example, are sent only from server switches to storage switches, such as switches 102C and D, with zoned, online storage devices. If a connected node device such as host 104A queries the switch 102A for node devices not connected to the switch 102A, then switch 102A can query the other switches 102B-D in the fabric 108 as described in U.S. Pat. No. 7,474,152, entitled “Caching Remote Switch Information in a Fibre Channel Switch,” which is hereby incorporated by reference. Operation of storage switches 102C and 102D is slightly different in that each of the storage switches must have route entries to each of the other switches, i.e. the other domains, to allow for delivery of change notifications to the server switches 102A and 102B. This is the case even if there are no servers zoned into or online with any storage devices connected to the storage switch.
As can be seen, the name server and routing tables according to the present invention are significantly smaller and therefore take significantly less time to maintain and develop as compared to the name server and route tables according to the prior art. By reducing the size and maintenance overhead significantly, more devices can be added to the fabric 108 and thus using particular switches will scale to a much larger number, given that the switch processor capabilities are one of the limiting factors because of the number of name server and route table entries that need to be maintained. This allows the fabric to scale to much larger levels for a given set of switches or switch processor capabilities than otherwise would have been capable according to the prior art.
Referring now to
As discussed above, there are certain instances where hosts must communicate with each other and/or storage devices must communicate with each other. The illustrated example of
Virtual machine movement using mechanisms such as vMotion can similarly result in communicators between servers. Similar to the above backup operations, the two relevant servers would be zoned together and the resulting routing table entries would be developed.
In the preferred embodiment the name server entries and route table entries develop automatically for the server and storage designated switches. Referring to
Developing the non-standard routes and instances, such as the illustrated tape device backup configurations or vMotion instances, is preferably done on an exception basis by a particular switch parsing zone database entries as shown in step 906 to determine if there are any devices included in the zone which have this horizontal or other than storage to server routing. If such a zone database entry is indicated, such as zones 110F or 110G, then the relevant switches include the needed routing table entries. Alternatives to zone database parsing can be used, such as FCP probing; WWN decoding, based on vender decoding and then device type; and device registration. After the parsing, the switch commences operation as shown in step 908.
Table 1 illustrates various parameters and quantities according to the prior art and according to the present invention to provide quantitative illustration of the increase in possible network size according to the present invention.
The comparison is done using scalability limits for both approaches for current typical switches. A server switch sees local devices and devices on all storage switches while a storage switch sees local devices and only servers zoned with local devices. For the comparison there are four server switches and four storage switches, with the server switches all directly connected to each of the storage switches. Another underlying assumption is that each switch has a maximum of 6000 name server entries.
Reviewing then Table 1, it is assumed that there are a maximum of 4000 server devices per switch. This number can be readily obtained using virtual machines on each physical server or using pass through or Access Gateway switches. Another assumption is that there are eight server devices per storage device. This is based on typical historical information. Yet another assumption is that there is a maximum of 512 storage devices per switch. With these assumptions this results in 5333 server devices per fabric according to the prior art. This number is developed because of the 6000 device limit for the name server in combination with the eight to one server to storage ratio. This then results in 667 storage devices per fabric according to the prior art. As can be seen, these numbers 5333 and 667 are not significantly greater than the maximum number per individual switch, which indicates the scalability concerns of the prior art. According to the preferred embodiment there can be 16,000 server devices per fabric, assuming the four server switches. This is because there can be 4000 server devices per switch and four switches. The number of storage devices per fabric will be 2000, again based on the four storage switches. The number of devices seen by the server switch or storage switch in the prior art was 6000. Again this is the maximum number of devices in the fabric based on the name server database sizes. In the preferred embodiment each server switch still sees 6000 devices but that is 4000 devices for the particular server switch and the 2000 storage devices per fabric as it is assumed that each server switch will see each storage device.
As the servers will be different for each server switch, the 4000 servers per switch will be additive, resulting in the 16,000 servers in the fabric. As the name server can handle 6000 entries, this leaves space for 2000 storage units, 500 for each storage switch. The number of devices actually seen by a storage switch is smaller as it only sees the local storage devices, such as the 512, and server devices which are zoned into the local storage devices. For purposes of illustration it is assumed to be 4500 devices seen per storage switch in the preferred embodiments. While in the prior art there was a maximum of 6000 devices in the entire fabric, according to preferred embodiment that maximum is 18,000 devices, which is developed by the 16,000 devices for the four server switches and the 2000 devices for the four storage switches.
In the prior art 32,000 zones would be programmed into a server switch and 4000 into storage switch based on the assumption of one zone for each storage device. In the preferred embodiments there would be 4000 zones on each switch. According to the prior art there are 27,000 unused routes programmed into either a server or storage switch while in the preferred embodiment there are no unused routes. As can be seen from the review of Table 1, significantly more server and storage devices can be present in a particular fabric when the improvements of the preferred embodiments according to the present invention are employed.
The switch ASIC 1095 has four basic modules, port groups 1035, a frame data storage system 1030, a control subsystem 1025 and a system interface 1040. The port groups 1035 perform the lowest level of packet transmission and reception. Generally, frames are received from a media interface 1080 and provided to the frame data storage system 1030. Further, frames are received from the frame data storage system 1030 and provided to the media interface 1080 for transmission out of port 1082. The frame data storage system 1030 includes a set of transmit/receive FIFOs 1032, which interface with the port groups 1035, and a frame memory 1034, which stores the received frames and frames to be transmitted. The frame data storage system 1030 provides initial portions of each frame, typically the frame header and a payload header for FCP frames, to the control subsystem 1025. The control subsystem 1025 has the translate 1026, router 1027, filter 1028 and queuing 1029 blocks. The translate block 1026 examines the frame header and performs any necessary address translations, such as those that happen when a frame is redirected as described herein. There can be various embodiments of the translation block 1026, with examples of translation operation provided in U.S. Pat. No. 7,752,361 and U.S. Pat. No. 7,120,728, both of which are incorporated herein by reference in their entirety. Those examples also provide examples of the control/data path splitting of operations. The router block 1027 examines the frame header and selects the desired output port for the frame. The filter block 1028 examines the frame header, and the payload header in some cases, to determine if the frame should be transmitted. In the preferred embodiment of the present invention, hard zoning is accomplished using the filter block 1028. The queuing block 1029 schedules the frames for transmission based on various factors including quality of service, priority and the like.
Therefore by designating the switches as server, storage or core switches; eliminating routes that are not between servers and storage, except on an exception basis; and only maintaining locally connected devices in the name server database, the processing demands on a particular switch are significantly reduced. As the processing demands are significantly reduced, this allows increased size for the fabric for any given set of switches or switch performance capabilities.
The above description is illustrative and not restrictive. Many variations of the invention will become apparent to those skilled in the art upon review of this disclosure. The scope of the invention should therefore be determined not with reference to the above description, but instead with reference to the appended claims along with their full scope of equivalents.