This invention relates generally to computer networks such as storage area networks, and more particularly to the hardware, firmware and/or software of one or more switches particularly as created by modular components.
A computer storage area network (SAN) may be implemented as a high-speed, special purpose network that interconnects one or more or a variety of different data storage devices with associated data servers on behalf of an often large network of users. Typically, a storage area network is part of or otherwise connected to an overall network of computing resources for an enterprise. The storage area network may be clustered in close geographical proximity to other computing resources, such as mainframe computers, or it may alternatively or additionally extend to remote locations for various storage purposes whether for routine storage or for situational backup or archival storage using wide area network carrier technologies.
SANs or like networks can be complex systems with many interconnected computers, switches and storage devices. Often many switches are used in a SAN or a like network for connecting the various computing resources; such switches also being configurable in an interwoven fashion also known as a fabric.
Various limitations in switch hardware and switch architecture have been encountered. These can, for example, be size and scalability limits, as for example where there can be interconnectability limits due, for example, to conventional chassis size limitations. In more detail, a chassis size issue can be attributed to certain hardware limits, some conventional devices currently providing for maximum numbers of switch devices to be connected therein. These limits may be based upon physical hardware issues within a constrained chassis arrangement, as for example, issues related to the provision of appropriate minimum power and/or cooling to the switches disposed or to be disposed within a particular chassis.
In one configuration, switches are assembled in a chassis using a selection of blade components. Individual blade components are fitted into slots in the chassis and connected to a chassis backplane for interconnectivity. For example, line card blades, switch blades, and other blade components can be inserted into a chassis to provide a scalable and customizable storage network switch configuration. Typically, the blades are controlled by shared control processors (e.g., one active and one backup), powered by one or more shared power supplies through the backplane, and cooled by a shared set of cooling fan trays.
However, adding blades in a chassis presents significant limitations. A chassis has a limited number of slots, and a SAN administrator may not have an open slot in which to add a further switch blade. Even with an available slot, an additional switch blade adds additional risk to the core switch system, reducing the overall mean-time-between-failures (MTBF). More devices mean more failure potential or lessened reliability. Further, some switch blades may tend to run hotter than other switch blades and therefore require placement in the better-cooled slots in the chassis. If such slots are already occupied by other blades, addition of an intelligent service blade can disrupt service as the other blades are moved around in the chassis. A chassis backplane also has power and signaling constraints that can restrict the scalability of a switch system.
Moreover, conventional switch systems present challenges in fault detection, isolation and recovery. Some such challenges may be direct results of scaling to larger systems whereby larger systems inherently include larger volumes of devices and greater complexities in interconnections therebetween, these larger volumes and greater complexities creating more difficulties in identifying and/or finding faults. Furthermore, conventional switch systems can typically suffer from difficulties in isolating the hardware having a fault within a system and recovering as a system to continue to provide communications and switching services despite the fault, particularly during any replacement or repair operations thereon.
Implementations described and claimed herein address these problems by providing improvements in methods, systems, hardware and/or architecture of computer or communication network systems. Briefly stated, a primary hardware improvement is in the provision of a discrete switch device, namely, a ported switch device that provides user ports and basic switching, and which is adapted to be operable as a basic switch system in an independent standalone fashion as well as being adapted to be operable in conjunction with a discrete ported or non-ported switch device. In further detail, provided here is a method, system or switch device, the switch device being one of a ported and a non-ported switch device, both of which being adapted to provide switching functions and the ported switch device providing user ports for connection to external devices, the non-ported switch device not including such external device connection ports. Moreover, either of the ported or non-ported switch devices hereof includes a housing containing an ASIC creating a switching system within the switch device; the housing further including a plurality of extender ports communicating with the ASIC and being connectable to themselves in loopback fashion or to one or more ported or non-ported switch devices, whereby the extender ports operate on a discrete protocol from standard ports. The ported switch device further includes a plurality of standard ports connectable to one or more external computer network devices and is adapted to be operable as a switch system in an independent standalone mode as well as being adapted to be operable in conjunction with a discrete non-ported switch device. Moreover, identification communication may be provided via the extender ports to enable the determination of any operable connection or absence of connection of the extender ports either to themselves in loopback fashion or to one or more discrete ported or non-ported switch devices. Such an identification communication may also or alternatively be involved in providing a health determination.
Alternatively, the present invention may involve a method for managing a switch system containing one or more ported switch devices and zero or more non-ported switch devices, the method including discovering one or more ported or non-ported switch devices via any connections extant therebetween; and operating the switch system; wherein the discovering operation includes or is associated with a health determination; and wherein the operating of the switch system includes one of operating one of the one or more ported switch devices in an independent standalone mode and operating the one or more ported switch devices in conjunction with the zero or more non-ported switch devices.
The technology hereof increases the flexibility of use of one or more switch devices as well as improving the management of a switch system including the creation, reconfiguration and maintenance of a switch system.
Other implementations are also described and recited herein.
In the drawings:
One or more switches may be used in a network hereof, as for example the plurality of switches 112, 114, 116, 118 and 120 shown in the SAN 104 in
Note, though only one fabric 105 is shown and described, many fabrics may be used in a SAN, as can many combinations and permutations of switches and switch connections. Commonly, such networks may be run on any of a variety of protocols such as the protocol known as Fibre Channel. These fabrics may also include a long-distance connection mechanism (not shown) such as asynchronous transfer mode (ATM) and/or Internet Protocol (IP) connections that enable sites to be separated by arbitrary distances.
Herein, the switches and/or the switching functions thereof are addressed as these reside within overall switch devices, particularly switch devices which have adaptabilities for operation in alternative or simultaneous discrete modes. Such adaptabilities may be in the form of intelligence or other capabilities within the switch device to selectively operate in either or both of two discrete modes. Moreover, each of the switch devices, e.g., each of switch devices 112-120 can be provided in a modular form for operability in the alternative modes, the modular form providing for standalone independent operation, as well as a stackable or rackable module or device configuration for interconnected operability as described further below.
In some implementations, a management client 222 may be connected to the director switch device 220 via an Ethernet connection. Other connection mechanisms and/or systems such as a typical serial connection or in-band management connection may alternatively be used if such a management client is connected to a switch device. The management client 222 may then provide user control and monitoring of various aspects of the switch device and other attached devices, including without limitation, zoning, security, firmware, routing, addressing, etc. The management client 222 can send or receive a management request to or from any or all switches, and the director switch device 220 will perform whatever portion of the requested management function it is capable of performing (if any) and forward instructions to the attached switch device possessing the referenced port for additional activity, if necessary.
An intelligent switch device according hereto provides user ports and basic switching. Such a switch device will also be referred to as a ported switch device herein. As introduced above, in one implementation, a single ported switch device may operate as a stand-alone switch. In an alternative implementation, multiple ported switch devices may be interconnected via extender ports to provide a switch system with a larger number of user ports. Interconnection by extender ports avoids consumption of the device's user ports and therefore enhances the scalability of the switch system. As described further below, another device particularly useful with a ported switch device hereof is a switch device without standard or conventional ports and is thus referred to as an unported or a non-ported switch device herein. Such a non-ported switch device provides non-blocking interconnection with ported switch devices and other types of devices or modules via extender ports which are typically non-standard or non-conventional ports. Use of such non-standard extender ports may provide non-standard high performance relative to what may be provided by a standard port protocol (e.g., Fibre Channel) which would have a blocking interconnection. Such non-standard ports may be used in a variety of connection schemes; whether in loopback connections of a device to itself, whether between ported switch devices (also referred to as a stackable configuration) or between ported and unported switch devices (also referred to as a rackable configuration). Though not typical, connections may in some alternatives be made between and amongst ported switch devices as well as between and/or amongst unported switch devices.
A view with switch devices 312-320 like the switch devices 212-218 of
In more detail,
In an implementation hereof, the ported switch devices 312-320 can connect to each other as well as to the un-ported switch devices 322 via cabling to extender ports (which are discrete and different from the standard user ports 311 shown in
An exemplary front and back connection scheme is shown in
A further optional switch service device 325 is shown also in
In any or all of the examples of
The making of the ported switch device operational in either a standalone mode or in the interconnected mode involves an adaptation of a ported switch device such that it will perform auto- or self-discovery. Typically, self-discovery involves the ability of a switch device to determine what devices, if any, it may be connected to so it will then know how to operate. In particular, discovery messages may be sent and/or received and negotiations can take place via the connections, particularly via the soft backplane connections (see cables 421B in
Reaching these determinations and/or these altered operational states may be implemented through use of one or more components within the ported switch device.
Each ASIC provides, among other functions, a switched or switchable datapath between a subset of the user ports 511 and the extender ports 513. For a stand-alone ported switch device, its extender ports can be cabled together with loopback cables (in an implementation hereof, each of the extender ports may be connected with a respective loopback cable to another extender port). For a stacked configuration, the extender ports of the ported devices are cabled together. For a racked configuration, the extender ports of the ported devices and the non-ported switch devices are cabled together. In one implementation, the extender ports are cabled using four parallel bi-directional optical fiber or high-speed copper links, although other configurations are contemplated.
Each processor may also have an embedded port through which it can access the switching system. The switching system views the embedded ports no differently than the front standard user ports, such that frames received at any front port on any ported switch device may be routed in hardware to the embedded port of any ported switch device processor on any ported switch device. Frames sent from the embedded port of any ported switch device may be transmitted out any user port, or may be received at an embedded port of any other ported switch device processor. Communications between processors of different ASICs of the same ported switch device as well as processors of different ported switch devices can communicate through the switching system with any other processor in the switch system.
In contrast, as shown in
Communication between ported and non-ported devices of
It should be understood that the hardware architectures illustrated in
Individual devices can include one or more subsystems, which are driven by firmware, hardware and/or software executed by individual processors in the switch. In one implementation, each flash memory in a device stores a full set of possible firmware components for all supported subsystems. Alternatively, firmware, hardware and/or software components can be distributed differently to different devices. In either configuration, each processor is assigned zero or more subsystems, such that a processor may load the firmware or software components for the assigned subsystems from flash memory and executes these components. In one implementation, a subsystem is cohesive in that it is designed for a specific function, and includes one or more independently-scheduled tasks. A subsystem need make no assumptions about its relative location (e.g., by which processor or which device its firmware or software is executed), although it can assume that another subsystem with which it interacts might be located on a different processor or device. A subsystem may also span multiple processors. For example, a Fibre Channel Name Server subsystem may execute on multiple processors in a switch. Subsystems may be independently loadable at initialization or run time and may communicate with each other by sending and receiving messages, which contributes to their location-independence. Furthermore, within a given processor's execution state, multiple subsystems can access a common set of global functions via a function call.
As introduced above and described in more detail below, an identification communication or discovery operation 702 of the more generally identified method 700 of managing a switch system in a computer network, see
As introduced above, the connections of the ported and/or unported switch devices via the extender port (XP) links can carry device-to-device control information, as for example an identification communication, in combination with user Fibre Channel and Ethernet data between ported switch devices and non-ported switch devices. The discovery operation 702 may thereby involve the sending of an identification communication whether of the actual identification information of a device, and/or of sending a query to the device cabled to each of a device's extender ports and the receiving of identification information from the remote device, including for example a device ID, a device serial number, and/or a device type.
The transmission of user frames or packets may depend on the proper configuration, by for example embedded software, for forwarding tables implemented as content addressable memories (CAMs) and “cell spraying masks”, which indicate how the parallel lanes of the XP links are connected. Before the CAMs and masks can be properly programmed, subsystems executing in different devices discover one another, per operation 702, e.g., and determine how the XP links are attached. In one implementation, discovery is accomplished using single cell commands (SCCs), which are messages segmented into units of no more than a single cell and transmitted serially over a single lane of a single extender port, point-to-point. The SCCs may be identification communications, for example.
Devices may thus discover one another by the exchange of SCCs sent from each lane of each extender port. Following a successful handshake, e.g., after a successful exchange of SCCs, each device adds to its map of XP links that connect it with other devices. In the case of ported switch devices where there are two processor pairs, each processor pair can communicate via the PCI bus to which they are both connected, however, intra-device discovery may nevertheless be accomplished via the extender ports. Even so, in an alternative implementation, processors within the same device could use internal communication links for intra-device discovery.
In one stage of discovery, termed “self-” or “intra-device” discovery, a single processor in the device 530 will assume the role of device manager. The processor will query its counterpart on the same device to discover the other's presence, capabilities and/or health during intra-device discovery. Another stage is termed “inter-device” discovery, in which processors on different devices exchange information. Each processor sends and receives SCCs via each connected extender port to obtain the device ID and device serial number of the device on the other end of the cable.
The discovery process 702 may be complete in itself, or may include sub-processes such as including recognition of the connected devices, if any; it may include or be included in an initialization or handshaking operation between devices. There may be negotiations between devices and/or there may be agreement or disagreement involved as well. For example, there may be agreement or disagreement between two ported switch devices about the connection or recognition (or about some other part of the discovery) operation. There may be confirmation and/or verification operation(s), or there may be separate establishment operations. Or, any or all of these steps may be implicit within the discovery process itself, i.e., where a discovery request is sent by one device to another, there may be an implicit determination of the connection based upon the response or lack thereof. Thus, the discovery operation may itself establish to the satisfaction of the respective devices what is and how the connection of devices is accomplished so that operation of the switch system may commence.
As introduced above, the discovery process of the extender port connection(s) may be implemented by software, firmware or hardware (purely by logic gates) or a mixture of software, firmware and/or hardware as for example hardware with software assist. The SCC handshake procedure described above may be one form of software or firmware implementation. Otherwise, an automated or automatic health and topology detection system implemented in hardware or firmware may be as follows.
Each end of each extender port (XP) link may be configured with an identification tag (or ID) which identifies its location in the system. For some implementations, this ID may contain board slot number, ASIC device number, the link number and/or a software version identification. The software version identification may be useful to check for compatibility of the software and/or firmware for upgradeability and/or to determine whether the software and/or firmware of a relative two or more switch devices may be compatible for interconnectability. The identification tag may be sent, as for example an identification communication, by the ASIC (as for example by a transmitter portion thereof, if included) upon initial linkup. Each receiver is configured to be able to receive the transmitted ID from the remote side of the link. When received from the link, the received ID may be placed in a special register at the receiver. This register may be called the Remote ID register. Both ends of the link transmit their identification tag and receive from the remote side its identification tag and then place the received tag in its Remote ID register. To determine the topology of the system, i.e., to perform the discovery operation, the firmware (or hardware implementation with or without a software assist) can read the Remote IDs from all links. If the devices then agree that they have a legal connection, then they agree to form an interconnected single switch system. As described further below, this scheme may also be used for health monitoring as well, particularly if the ID tags are configured to be transmitted at continuing intervals after linkup.
Once the discovery operation 702 has been completed, the operation 704 of the switch system may then be achieved (see
As introduced above, the scheme of identification communication over the XP links may also be used for/in various forms of health monitoring as well.
Firstly, an initialization handshake procedure such as that described wherein one or more identification communications may be had between ASICs (e.g., ASICs within the same device or on separate devices) and/or between devices may also involve a health determination, not merely an exchange of identification information. See e.g.,
As a second health determination alternative, the respective transmitters in each device (ported or unported) may be configured or otherwise instructed to re-send an identification communication or tag periodically. Each receiving device can then, upon receiving the identification tag, compare the newly arrived tag to the previous tag value stored within its Remote ID register. If a different ID tag is received than is stored in the local ID register, firmware, hardware or software may be informed that a connection or topology change has occurred. Such a change could represent or be interpreted as a health issue, e.g., a failed or disconnected connection, or a failure at or an illegal remote device.
Similarly, the firmware, software or hardware may be initialized with or have initialized the remote ID prior to an actual first linkup or interconnection to provide validation of the expected interconnection or expected system topology. If on upon first linkup, the topology does not match the expected topology, the Remote ID can flag the difference.
Moreover, in a situation where all transmitters are disposed to periodically resend their identification tag, if an identification tag is not received periodically on a particular link, the receiving device firmware, software or hardware may be so informed of the missing heartbeat event. This event can, as mentioned above, indicate that the remote transmitter may be disconnected from the link, or have otherwise failed (e.g., hardware degradation to failure or powered off). The rate of retransmission of these ID communications or tags may be based on the speed at which link issues would need or be preferred to be discovered. For some implementations, the rate of retransmission may be once every 100 milliseconds. In at least one substantially conventional switch system, this rate can ensure accurate topology and health monitoring and yet not impose any significant bandwidth overhead.
As a further option hereof, the modularized ported and/or unported switch devices hereof may provide for isolation and/or recovery after detection of some fault. Thus, upon the determination of a health event, as for example after detection of a link down event, provided hereby may be a system (with hardware, firmware and/or software) and/or a procedure for re-routing data, as for example by re-routing the partial data packets or data cells thereof. This is shown generally in
As a first option when an extender port connection or link fails or has failed, this link will effectively be taken out of service, at least insofar as the transmitter may be configured or re-configured to stop sending cells via this lost link. This may effectively isolate the fault. Then in recovery from the fault, the software, firmware and/or hardware of the devices which detected the fault/lost link, may thus be informed to and may then begin the process of reconfiguring the device itself to avoid or circumvent the lost link, a sort of dynamic re-routing. The software, firmware and/or hardware may further be configured to or be configurable to communicate to other parts of the switch system to also avoid the lost link.
In a more particular implementation, it may occur that when an extender port connection or link is lost in a switch system such as that described above, data cells may have already been trapped in the switch device having already been sent from the source ASIC and may thus be unable to traverse the down link. If the cells are not able to reach their destination, the data packets or frames they belong to will be corrupted. The corruption can lead to overall network instability so it is important to avoid any cell loss if possible.
Recovery in such a situation may include having software, firmware and/or hardware detect whether any cells are still awaiting transmission to/through the downed link. If any cells are awaiting transmission, the cells may then be removed from the transmission queue of the downed link and sent to the target ASIC using a different link. These data cells will be labeled with a special identifier indicating that they were re-routed from the downed link so that the destination ASIC can interpret them correctly. It is possible that the data cells will have to be routed through one or more other ASICs prior to reaching the destination ASIC and/or through one or more other switch devices before they may actually be able to gain access to their desired destination ASIC. The easiest route will usually be to use another link (i.e., a redundant link if such is so connected) which connects the source switch ASIC to the destination ASIC. However, if a direct link does not exist, the cell or cells will be sent to another ASIC or another switch device (ported or unported) which does have access to, directly or indirectly, the destination ASIC.
Two further alternative implementations for recovery may also be used. These two involve intelligence in or added to the data flow itself. In particular, this involves what is termed here as a redundant assembly identification operation wherein either upon request or on a regular schedule, the source ASIC is associated with or otherwise causes the inclusion within the data stream one or more identifier error correction cells to provide information for the re-assembly of the data cells into appropriate data packets or frames even if one or more links have been broken. Note these error correction cells may be generated upon demand after a link breakage is detected, or may be generated substantially constantly throughout a data transmission period as a preventative before detection of a broken link.
Modular architectures according hereto may provide for one or more of high performance, scalability, configuration flexibility, and near-linear cost scaling (pay as you grow). Such results may come from streamlining the switch device building blocks to common modules or blocks which can be used to optionally build or create all ranges of switches from standalone switches, stackable switches, rackable directors (thus, small, medium, large, and very large options). These modules or blocks also provide for late binding of product family configurations to react to market and customer needs, as for example providing a low cost-of-entry to customers that want to start at the smallest or most economical configuration possible to save the initial deployment cost/budget, and yet also provide a near-linear pay-as-you-grow scaling and upgrades to meet variety of on-demand growths of customer applications. Avoiding the cost of one or more chassis reduces the cost bumps that currently occur. Also, the modular, or building blocks hereof can provide a cost efficient bill of materials (BOM) and manufacturing with consolidation of components and efficient streamlining of test flow.
The embodiments of the invention described herein may be implemented as logical steps in one or more computer or computer-related systems. The logical operations of the present invention are implemented (1) as a sequence of processor-implemented steps executing in one or more computer systems and (2) as interconnected machine or circuit modules within one or more computer systems. The implementation is a matter of choice, dependent on the performance requirements of the computer system implementing the invention. Accordingly, the logical operations making up the embodiments of the invention described herein are referred to variously as operations, steps, objects, or modules. Furthermore, it should be understood that logical operations may be performed in any order, unless explicitly claimed otherwise or a specific order is inherently necessitated by the claim language. In some implementations, articles of manufacture are provided as computer program products. One implementation of a computer program product provides a computer program storage medium readable by a computer system and encoding a computer program. Another implementation of a computer program product may be provided in a computer data signal embodied in a carrier wave or other communication media by a computing system and encoding the computer program.
The above specification, examples and data provide a complete description of the structure and use of exemplary embodiments of the invention. Since many embodiments of the invention can be made without departing from the spirit and scope of the invention, the invention resides in the claims hereinafter appended. Furthermore, structural features of the different embodiments may be combined in yet another embodiment without departing from the recited claims.