1. The Field of the Invention
The present invention relates to systems and methods for pattern-based correlation of non-translative network segments. More particularly, the present invention provides for a causal correlation to be determined, using pattern-based methods of identifying typical cause effect network activity, between network activities occurring in network segments that operate in differing network protocols.
2. The Relevant Technology
Computer and data communications networks continue to develop and expand due to declining costs, improved performance of computer and networking equipment, and increasing demand for communication bandwidth. Communications networks, including for example, wide area networks (“WANs”), local area networks (“LANs”), and storage area networks (“SANs”) allow increased productivity and utilization of distributed computers or stations through the sharing of resources, the transfer of voice and data, and the processing of voice, data, and related information at the most efficient locations. Moreover, as organizations have recognized the economic benefits of using communications networks, network applications such as electronic mail, voice and data transfer, host access, and shared and distributed databases are increasingly used as a means to increase user productivity. This increased demand, together with the growing number of distributed computing resources, has resulted in a rapid expansion of the number of installed networks.
In a protocol-homogeneous networking environment, with a sufficiently detailed understanding of the networking protocols in use, a network engineer can correlate a network request from a particular endpoint, to particular traffic patterns along the transit path, through various traffic control points such as switches or routers, and to the one or more target destinations for that original network request. For example, in the case of a TCP/IP network, depending on how the Address Resolution Protocol (ARP) is used, the source and destination MAC (physical) addresses are available in the network transmission itself. And so as the packet traverses across a network topology, it can be correlated to the packet which traversed a previous segment of the topology. At a higher level, using IP addresses and test pings, there are utilities which discover and display network segments, such as “traceroutes,” illustrating this point.
As the demand for networks has grown, however, network technology has grown to include many different physical configurations. As an example, an enterprise may employ a communications system that uses five different data communications protocols, which set forth the rules for accessing the network and the communications primitives amongst the resources on the network, each adapted for a particular situation. Such protocols may include: a first protocol for a high speed, inexpensive short-haul connection on the computer motherboard; a second high-bandwidth protocol for data center transmissions across for example fiber optic cables; a third protocol that is suited for efficiently transmitting information across the enterprise local area network (“LAN”) across for example electrical cables; a fourth protocol adapted for high bandwidth, long haul applications across for example fiber optic cables or microwave links; and, finally, a fifth transmission protocol suited for data transmission to high performance disk drive storage systems at a storage area network (“SAN”) across for example fiber optic cables. Thus, the typical communications system comprises a patchwork of different subsystems and associated communications protocols. More specific examples include: TCP/IP, Gigabit Ethernet, Asynchronous Transfer Mode (“ATM”), Synchronous Optical Network (“SONET”), Fiber Distributed Data Interface (“FDDI”), Fibre Channel, and InfiniBand networks. These and the many other types of networks that have been developed typically utilize different cabling systems, different bandwidths and typically transmit data at different speeds.
In a non-homogeneous network, many network topologies consist of segments which have different physical media, or different underlying protocol. However, through encapsulation, tunneling, or protocols-on-top-of-protocols, one can identify a common software protocol through the entire topology. For example, it is common to interconnect ATM networks running a layered TCP/IP Point to Point Protocol (“PPP”) on top of them, to a router which then connects to a native, TCP/IP network on Ethernet. In this way the ATM and Ethernet networks share a homogenous TCP/IP protocol across them.
If the network is not homogenous at some protocol level, correlation of network traffic across these segments is challenging. For example, a mixed data network utilizing TCP/IP protocol and a Storage Array Network (SAN), utilizing Fiber Channel (“FC”) protocols, can be problematic. Traffic on the TCP/IP network destined to cause a resultant conversation with the data storage subsystem connected to the SAN would be translated by software and firmware in intermediate servers into FC-based SAN protocol. The addressing scheme, the state transitions, timing, and routing/switching conventions in SANs are completely different than in TCP/IP systems, and thus there is no straightforward way to correlate packets or activity on the SAN network with the TCP/IP network. We call these “non-translative” network segments because there is no way to directly translate traffic and traffic patterns in one network segment into traffic and traffic patterns in another.
As communication networks have increased in number, size and complexity, therefore, they have become more likely to develop a variety of problems that are increasingly difficult to diagnose and resolve. Moreover, the demands for network operational reliability and increased network capacity, for example, emphasize the need for adequate diagnostic and remedial systems, methods and devices.
Exemplary causes of network performance problems include the transmission of unnecessarily small frames of information, inefficient or incorrect routing of information, and improper network configuration and superfluous network traffic, to name just a few. Such problems are aggravated by the fact that many networks are continually changing and evolving due to growth, reconfiguration and introduction of new network typologies and protocols, as well as the use of new interconnection devices and software applications.
Consequently, as high speed data communications mature, many designs increasingly focus on reliability and performance issues. In particular, communications systems have been designed to respond to a variety of network errors and problems, thereby minimizing the occurrence of network failures and downtimes. In addition, equipment, systems and methods have been developed that allow for the testing and monitoring of the ability of a communications system to respond to and deal with specific types of error conditions on a network. In general, such equipment, systems, and methods provide the ability to selectively alter channel data, including the introduction of errors into channel data paths.
Using network analysis tools, network administrators can identify and resolve various types of network problems. In some situations, network problems may be resolved by sampling a portion of the data transmitted across the network or by performing a statistical analysis on portions of the transmitted data. Other solutions require the collection of all data that traverses the network during a given time period. Collecting all of the data into a capture enables a network administrator to perform a detailed analysis on the collected data.
Implementation of this functionality on non-translative networks, however, requires that a causal relationship be identified between the data captured by way of the various links. As a result, in networks having non-translative network segments, there is a need for systems and methods to precisely correlate traffic amongst the segments. It would therefore represent an advance in the art of networked communications systems to enable the correlation of traffic between non-translative segments in computing networks.
The present invention provides methods and systems to correlate two or more connected but non-translative computer and/or storage networks. Conventionally, it has been impossible to understand a cause and effect relationship between non-translative networks because of the difficulties in operating with differing protocols. The present invention derives such cause and effect relationships by creating special traffic packets, patterns, and sets of patterns, injecting them into the various network segments at nodes, and then listening via trace captures in the various network segments at other nodes. A comparison of the traced network activity to the generated network activity allows for the formation of correlation rules which can be used to recognize similar patterns caused by the same activities in the future.
Accordingly, a first example embodiment of the invention is a method for correlating non-translative network segments in a multi-protocol communications system. The method generally includes: providing at least two connected nodes within a network, wherein- a first node is in a non-translative network segment with respect to a second node; at the first node, generating and injecting a defined network pattern into network traffic and recording precisely the time stamp of the network pattern injection; at the second node, listening to network traffic, taking a copy of the traffic passing by as a trace, and adding precise time stamp information to the trace; correlating the generated defined network pattern to the traced traffic; and from the correlation of the generated defined network pattern to the traced traffic, deriving protocol cause and effect correlation rules.
Another example embodiment of the invention is a method for correlating non-translative network segments in a multi-protocol communications system. This method generally includes: providing a plurality of connected nodes within a network, wherein a first node is in a non-translative network segment with respect to a second node; providing pattern matching data which indicates protocol cause and effect correlation rules; at each of the plurality of nodes, listening to network traffic, taking a copy, as a trace, of the traffic passing by; applying a run-time process to the traced traffic using the stored pattern matching tables to recognize correlations; and from the recognized correlations, deriving the causality, in a first network segment, of a network activity that is detected in a second network segment that is non-translative with the first network segment.
These and other objects and features of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.
To further clarify the above and other advantages and features of the present invention, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
The present invention provides a way to correlate two or more connected but non-translative computer and/or storage networks. As used herein, the term “non-translative networks” refers to networks which do not have a common protocol across them. Conventionally, it has been impossible to understand a cause and effect relationship between non-translative networks. The present invention derives such a traffic relationship by creating special traffic packets, patterns, and sets of patterns, injecting them in to the various network segments at nodes, and then listening via trace captures in the various network segments at other nodes. A comparison of the traced network activity to the generated network activity allows for the formation of correlation rules which can be used to recognize similar patterns caused by the same activities in the future.
As used herein, the term “node” refers to a point in a communications network where two or more communication paths come together in a device, such as by way of example only, a switch, a server, a network analyzer, a computer, or an external device such as a network probe.
The invention takes advantages of the cause and effect relationship in traffic patterns across non-translative network segments. These patterns are typically only initially discernable only if a single application is the cause of the pattern. In other words, given a set of networking cause patterns {M} from one network segment (e.g., A Windows Network Filesystem on a TCP/IP LAN), one can derive, for each cause-pattern in {M}, typical response patterns {N} from the other network segment (e.g., “a SAN”). Thus there can be correlated a set of {M:N} and {N:M} patterns. These patterns can then be used derive correlation rules than can be used to identify the sources of network activity, particularly problems.
For example, filesystem protocols are often the most relevant to a network analysis, including those of Windows LAN and NFS for UNIX LAN. Depending on the types of operations that are of interest, a developer can determine how to simulate, through generation, the basic network traffic from the LAN side at the TCP or UDP level, for the filesystem operations. Network traffic is then traced in other sections of the network after the simulation is initiated and patterns of network activity are recognized. The network patterns can then be reduced to protocol cause and effect correlation rules, which allow for the identification of network activity such as: list, mount, read, seek, write, open, close, delete, and the like.
Reference will now be made to the drawings to describe various aspects of exemplary embodiments of the invention. It is to be understood that the drawings are diagrammatic and schematic representations of such exemplary embodiments, and are not limiting of the present invention, nor are they necessarily drawn to scale.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be obvious, however, to one skilled in the art that the present invention may be practiced without these specific details. In other instances, well-known aspects of network systems have not been described in particular detail in order to avoid unnecessarily obscuring the present invention.
With reference to
In addition, the non-translative network 100 as depicted includes network probes 106, external server 108, and computer 10. More particularly, each of SAN network 102 and LAN network 104 may have varying degrees of “granularity,” meaning they can have numerous parts and components from many manufacturers, thus complicating the networks and making the task of isolating problems more difficult. As generally depicted, such network parts or components may include, by way of example only, servers, routers, mass storage devices, probes, switches, network analyzers, and other computing devices known in the art or developed hereafter. As a result, the number of parts or components a packet travels through from one end of a network to another may vary greatly within various embodiments of the invention.
In one embodiment, the computer 110 is a network analyzer or similar apparatus for monitoring network data traffic in the communications network 102 in order to detect and diagnose problem conditions existing in the network, such as problem conditions existing between network components or links between components. In various embodiments of the invention, methods as disclosed herein may be coordinated and/or executed by computer 110.
In addition, network probes 106 are inserted external devices that serve to capture traces of network traffic. In one embodiment of the invention, each network segment that is to be correlated is attached to such a probe to capture traces within that network segment.
In preferred embodiments of the invention, there are also generators at one or both ends of the network topology to be correlated. Although the precise definition of “generator” is not critical to the invention, at a minimum a generator will be operable, manually and/or automatically, to generate packets and or network traffic patterns to inject into the network traffic. Probes and generators will also preferably be equipped with some mechanism to record a “time stamp” to record the time at which a given piece of network traffic was either injected into the network or recorded as a trace.
As seen in
There is a cause and effect relationship in activity in each network. According to the invention this cause and effect relationship can be tracked through pattern recognitions across non-translative network segments which are working on the same problem. In other words, activity on one network can cause activity on the other network in a recognizable pattern. Each activity in a first network segment will have a respective patterned response it induces at another network segment, and vice versa. According to the invention, these patterned responses can be identified and used to correlate activity across non-translative network segments, thereby helping to identify the source of network problems.
Referring now to
Next, network traffic in known stimulus patterns is generated and injected into network traffic, as indicated by block 404. This is preferably performed when the network is “quiet” in that other network traffic is avoided so that network activity can be precisely recorded. It should be noted that the generated and injected stimulus patterns preferably correspond to designated activities, for example: open file, save file, access Internet web site, etc. Thus, the generated and injected stimulus pattern will provide a footprint for how that pattern affects network activity throughout the network. Ideally, the entire process will be repeated, varying only this step, to inject different stimulus patterns and thereby detect and store the response patterns caused by a number of network activities.
Network traffic is next recorded as traces with precise time stamp information, as indicated by block 406. In other words, the corresponding network patterns caused by an initial activity at downstream locations in the network is measured. The process of injection and trace recording can be performed bi-directionally on the topology, e.g., generated from both ends and capture/trace from both ends. In addition, the process can be initiated and executed with any desired degree of manual operation or automation.
The generated traffic patterns and the traced network traffic can then be correlated to match patterns in the generated traffic and the traced traffic, as indicated at block 408.
Next, the correlated patterns can optionally be presented visually to a user in a comparative manner in a graphical user interface, as indicated by block 410. For example, shown in
As indicated at block 412, protocol cause and effect correlations rules can then be determined. In one embodiment of the invention, the protocol cause and effect correlations rules can be determined without presenting the graphs visually to a user, as indicated by arrow 414. Such rules can be determined automatically by expert system, statistical or other methods known in the art in conjunction with the computing devices disclosed herein or otherwise known in the at.
One example of a preferred method is called the Time Series Composite Correlation technique. Generally, in this method each network trace is digitized to a common granularity depending on the speed of the network. For networks operating in the gigabit per second range the granularity for digitization should be in the microsecond range. This digitization is called a streaming time series. Each streaming time series contains triple values for each data point: streamID, timeposition, and value. A unit time window is chosen, which is suitably long, by way of non limiting example 1 second. This ensures that a cause and an effect can be held within the same time window. Let s[i] denote the value of the stream s at time position i and s[i . . . j] denote the subsequence of stream s from timeposition i through j inclusive. Let si denote the stream with the streamID i. Use t to denote the latest timeposition. A strong correlation of any stream pair will be close to −1 for high negative correlations, and close to +1 for high positive correlations, as calculated using the following formula:
corr(s, r)={Σwi=1siri−w {haeck over (r)} {haeck over (s)} }/{(Σwi=1 si2−w{haeck over (s)}2)1/2(Σwi=1 ri2−w {haeck over (r)}2)1/2}
where {haeck over (r)} and {haeck over (s)} are the average value of stream r and s, respectively, over the silding window. The correlation term t is derived by applying an application dependent threshold function T on the resultant corr(s, r) yielding a “true” or “false” for correlation term ti. A composite correlation, then, is in the form t1t2 . . . tn. A composite correlation pattern can be evaluated at any timeposition and is evaluated to be either true or false at any given timeposition. By adjusting time offsets in the data streams and by running several sets of correlation calculations through multiple time windows, correlations can be discovered across streams, using this algorithm. This algorithm is just one example of many possible algorithms which can be used to determine correlation.
This process can be repeated across various network segments at any desired degree of granularity for any number of activities to determine a database of rules for recognizing network patterns.
Referring now to
The recorded traces at a give node are then correlated with known pattern matching data via run-time processes, as indicated by block 508. Such correlations can be determined automatically by expert system, statistical, streaming time series, or other methods known in the art in conjunction with the computing devices disclosed herein or otherwise known in the art. On example is the Time Series Composite Correlation technique described above. These correlations are optionally presented to a user in a visually comparative manner, as indicated by block 510. From the pattern matches and the protocol cause and effect correlation rules the source of network activity can be determined, as indicated by
Details associated with complementary time-based methods for correlating non-translative network segments are disclosed in U.S. patent application Ser. No. ______ (not yet received), entitled “Time-Based Correlation of Non-Translative Network Segments,” bearing attorney docket No. 15436.343.1, which has been filed on the same day as the present invention and is incorporated herein by reference. The pattern-based methods of this invention can be practiced in combination with or independently from the time-based methods disclosed in the foregoing patent application.
In at least some cases, some or all of the functionality disclosed herein may be implemented in connection with various combinations of computer hardware and software. For example, at least some devices use hard coded devices such as field programmable gate arrays (“FPGA”) to implement pattern generation, injection, trace capture, and data correlation functionality. Other devices employ both hardware and software to implement various functions disclosed herein.
With respect to computing environments and related components, at least some embodiments of the present invention may be implemented in connection with a special purpose or general purpose computer that is adapted for use in connection with communications systems. Embodiments within the scope of the present invention also include computer-readable media for carrying or having computer-executable instructions or electronic content structures stored thereon, and these terms are defined to extend to any such media or instructions for use with devices such as, but not limited to, link analyzers and multi-link protocol analyzers.
By way of example such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code in the form of computer-executable instructions or electronic content structures and which can be accessed by a general purpose or special purpose computer, or other computing device.
When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a computer or computing device, the computer or computing device properly views the connection as a computer-readable medium. Thus, any such a connection is properly termed a computer-readable medium. Combinations of the above should also be included within the scope of computer-readable media. Computer-executable instructions comprise, for example, instructions and content which cause a general purpose computer, special purpose computer, special purpose processing device, such as link analyzers and multi-link protocol analyzers, or computing device to perform a certain function or group of functions.
Although not required, aspects of the invention have been described herein in the general context of computer-executable instructions, such as program modules, being executed by computers in network environments. Generally, program modules include routines, programs, objects, components, and content structures that perform particular tasks or implement particular abstract content types. Computer-executable instructions, associated content structures, and program modules represent examples of program code for executing aspects of the methods disclosed herein.
The present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
This application claims the benefit of Provisional Application No. 60/502,011, filed Sep. 11, 2003, and Provisional Application No. 60/502,020, filed Sep. 11, 2003, both of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
60502011 | Sep 2003 | US | |
60502020 | Sep 2003 | US |