1. Field of the Present Invention
The present invention generally relates to the field of data communication systems, and more particularly to a system and method for storing and transmitting emulated network flows for performance testing of data communications network components.
2. History of Related Art
In the field of computer networks, the need arises to add various data communications network components to a network or to replace various data communications network components presently on the applicable network in question. Examples of these data communications network components include, among other things, network switches, routers, load balancers, firewalls, and web servers, among many others.
Prior to adding to and/or replacing a network's components, however, it is desirable to test and validate the functionality of the applicable network components to ensure the network components will function properly when deployed on the applicable network. Failure to test and validate the functionality of the network components properly prior to implementation may result in the applicable network being adversely impacted. Similarly, it is desirable to model a proposed network expansion prior to implementation.
While it would be preferable to test and validate the functionality of the applicable network components utilizing real-time data flows in their intended environment (i.e., the actual network on which the network component will be deployed), such testing and validation is generally not practicable for a number of reasons. Typically, data security issues, network capacity issues, issues resulting from the possibility that the network may be rendered inaccessible because the network component under test failed, and related issues make it unlikely that the applicable network component can be tested and validated in a “live” environment. Consequently, the need arises to permit the off-line testing and validating of the applicable network component with network flows that most closely emulate the actual network flows on the network in question.
Conventional network test equipment utilizes traffic generators that are preset based (i.e., the tester creates sample flows for use in testing the applicable network component). This scheme does not always provide a method for emulating packets reflective of the flows on the network under test. Further, it may take a considerable amount of time to create the sample flows for testing purposes. Other network test equipment utilizes traffic generators that are storage based. Storage based generators record live flows from the network in question and save the contents of the network flows in storage. Although storage based generators are ideal in terms of in their ability to capture actual network traffic, the subsequently produced emulated flows are not real world because environment attributes are not the same due to “time” related parameters. Additionally, security concerns with the data associated with the network flows may render this scheme unusable. Further, storage and recording constraints make it impracticable for this scheme to record large amounts of data associated with the network flows.
More generally, there are many applications other than network testing for which it would be highly desirable to implement an efficient and dynamic technique for capturing, storing, and/or transmitting large amounts of network packet data. Data analysis, for example, is a broad area in which a dynamic technique for capturing and compressing large amounts of packet information would be highly beneficial. This area would include security analysis in which, for example, packet anomalies are identified for further scrutiny. In addition, data analysis applications would include pure statistical analysis to determine the composition of packet traffic on a given network. A technique for efficiently capturing and storing packet information would also be beneficial in the area of high speed network traffic. In applications where the rate of network traffic pushes the physical ability of the network to handle the traffic, the ability to compress packets has a great deal of utility.
Accordingly, it would be broadly beneficial to implement a system and method for the efficient storage and emulation of network data traffic.
Objects and advantages of the invention will become apparent upon reading the following detailed description and upon reference to the accompanying drawings in which:
While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the drawings and detailed description presented herein are not intended to limit the invention to the particular embodiment disclosed, but on the contrary, the invention is limited only by the language of the appended claims.
Generally speaking, an embodiment of the present invention contemplates the processing and encoding of network flows so that the encoded results emulate the original network flows, but can be stored in significantly less storage than would otherwise be required for storing the original network flows. Once encoded, characteristics and attributes of the stored network flows may be examined and, if desired, manipulated to facilitate the emulation of different network flows. The stored encodings of the network flows may be decoded and transmitted for purposes of testing network components. Throughout the description and the drawings, elements which are the same will be accorded the same reference numerals.
Before discussing details of a method of processing and encoding network flows, a description of a suitable hardware platform and application environment is described with respect to
Frequent reference is made to protocols throughout this specification. Generally, in computer networks, a protocol is a convention or standard that controls the connection, communication, and data transfer between two computing endpoints. Network communication typically involves multiple protocols that are “layered” in a “protocol stack.” The protocol stack includes lower layers that define the physical network medium and addressing, communication, and transport issues. Examples of other communication or transport layer protocols include, but are not limited to, Real-time Transport Protocol (“RTP”), Sequenced Packet Exchange (“SPX”), Stream Control Transmission Protocol (“SCTP”), and User Datagram Protocol (“UDP”). Examples of lower level protocols include Transmission Control Protocol (TCP) and the Internet Protocol (IP), which are frequently layered together in a TCP/IP stack such as is utilized on the Internet.
Layered over these lower level protocols in a typical protocol stack are the application layers, which define or specify more abstract concepts such as commands and data. One popular component of the Internet is the World Wide Web (“WWW” or “web”) which is a collection of resources on servers on the Internet that utilize the Hypertext Transfer Protocol (“HTTP”) application layer protocol. HTTP is suitable for controlling access to resources on the web. Like many application layer protocols, HTTP uses a client-server model. In the client-server model, an HTTP client, such as a remote user, opens a connection and sends a request message to an HTTP server, such as a web server, which then responds with a message to the client. While HTTP utilizes an ASCII Text format, it will be appreciated by those skilled in the art that many protocols are not in human readable form. Examples of other application layer protocols include, but are not limited to, Apple Filing Protocol (“AFP”), Domain Name Service (“DNS”); Dynamic Host Configuration Protocol (“DHCP”); File Transfer Protocol (“FTP”); Internet Message Access Protocol (“IMAP”); News Network Transfer Protocol (“NNTP”); Simple Mail Transfer Protocol (“SMTP”); Simple Network Management Protocol (“SNMP”); and Trivial File Transfer Protocol (“TFTP”). Specifications for all of these protocols are generally defined and maintained by the Interent Engineering Task Force (IETF) and are publicly available from the IETF web site (IETF.org). The listed protocols are merely exemplary of the some of the most pervasive protocols. The flow encoding and transmission methods described herein are not, however, limited to widely implemented protocols.
It is well known in the art that protocols generally have certain attributes that are defined as part of the applicable protocol and that remain constant across packets associated with the particular protocol. That is, packets associated with the particular protocol will comply with a protocol-specific format that defines the applicable attributes of the protocol. Further, a protocol may include specific types of packets that generally have the same applicable attributes of the protocol. For example, the HTTP protocol may be thought of as including request packets and response packets. Request packets and response packets have certain respective attributes that are of particular relevance for a method of encoding packets. For example, request packets may be classified as including a TYPE attribute, (which identifies the type of request), a HOST protocol attribute, (which indicates a host URL), and a USER-AGENT attribute, (which specifies information about the client that generated the request). By way of further example, HTTP response packets generally have their own defined protocol attributes including a response code attribute, a server TYPE attribute, a content TYPE attribute, and a content length attribute.
The allowable values for each attribute are generally pre-defined in accordance with the applicable protocol, and thus, generally there are either a limited number of values for the particular attribute or the majority of values for the particular attribute fall within a limited subset of possibilities. To illustrate, a set of predefined values associated with a TYPE attribute of a request packet transmitted under the HTTP protocol may be limited to the number of different types of requests that the HTTP protocol defines or fewer. Thus, predefined values associated with a TYPE attribute for an HTTP request may be limited to (1) GET, (2) POST, (3) PUT, or (4) OTHER. The TYPE attribute may include additional information such as whether the request specified a target URL and what protocol/version the request complies with. In this example, a TYPE attribute might indicate a packet as being an HTTP 1.1 GET request that specified a target URL. A second attribute may be a HOST protocol attribute that reflects information about the host URL specified in the request (see description of
As used herein, the term “protocol attribute” is to be broadly read as one or more characteristics representing defined fields or requirements for packets or network flows transmitted under a particular protocol and the term “attribute value” is to be broadly read as the predefined values, or a limited subset of values associated with, a particular protocol attribute. It will be appreciated by those skilled in the art that a protocol may be examined and relevant protocol attributes for the protocol determined.
Returning now to
If the protocol is not recognized or validated, however, the packet is recorded (block 115) “as is” by saving the packet to storage without encoding, encryption, or compression. The depicted embodiment of method 100 includes functionality for learning (block 135) protocols that were not validated or otherwise recognized in block 110. In the depicted implementation, packets that were not validated in block 110 are stored for subsequent protocol learning until a sufficient number of packets is available. In such implementation, a predefined or user selectable sample size “T” is chosen. Until T packets have been accumulated (block 130), method 100 merely records the packets by saving them to storage. The value of “T” is preferably chosen to ensure an adequate sample size without resulting in any significant loss of time and/or storage space. In many applications, for example, a sample size of approximately 1000 is generally thought to provide a proper balance between obtaining sufficient information and obtaining too much information. Of course, the value of T is an implementation detail and the value of T in any given application may be greater than or less than 1000. Once a sufficient number of a packets have been captured and stored, method 400 includes invoking or otherwise calling the learning algorithm represented by block 135.
In some embodiments, protocol learning algorithm 135 is implemented as a technique for discovering bit patterns in a sufficiently large sample of packets to make a conclusion about the bits. As a simple example, if every captured packet included a value of “1” in its first bit, the first bit could be disregarded for the purpose of storing the packet and later transmitting an emulated packet. Extending this example, if the first three bits of 95% of the captured packets contained a value of either 001, 010, or 101, the first three bits could be represented or encoded using a 2-bit representation where, for example, the value 001 is assigned an encoded value of 00, the value 010 is assigned a value of 01, the value 101 is assigned a value of 10, and any other values are assigned a value of 11. This exemplary encoding is characterized by a high degree of accuracy (i.e., the first three bits of 95% percent of packets can be reproduced exactly) but a relatively low level of compression. 3-bits have been encoded with 2-bits thereby saving a single bit. Generally, encoding in the described manner involves a tradeoff between accuracy and the amount of compression achievable. The amount of accuracy and compression required is an implementation detail.
In block 144, protocol learning method 135 associates each of the variants identified in block 142 with a corresponding K-bit encoding, where 2K is greater than or equal to N, and N is the number of variants as described above. This, for the example where the number of variants is 4, a unique 2-bit encoding may be assigned to each of the four variants. While this type of encoding is the most efficient in terms of the number of bits conserved, other encoding implementations are possible. For example, a four bit encoding might be used to encode the four values of the J-th byte with each bit in the four bit encoding representing one of the four J-th byte variants.
In block 146, the association between byte J and the corresponding K-bit encoding is recorded in a dictionary or other suitable data structure to preserve the encoding scheme. Blocks 141 through 146 are then repeated for each byte in the packet by comparing (block 147) J to a MAX variable that indicates the number of bytes in a packet and incrementing (block 148) J until all bytes in the packet have been processed.
In this manner, protocol learning method 135 provides functionality that enables the data encoder application to develop encodings for previously un-encountered protocols. It should be appreciated that the learning method 135 described in
Returning now to the protocol verification of block 110 in
As depicted in
A dictionary 300 may be associated, for example, with an attribute of interest for an HTTP request. In this case, protocol designator 305 would be HTTP, attribute designator 310 would be the protocol attribute of interest, and the set of predefined values 315 may include entries reflecting different possible values for the HTTP request packet attribute of interest. If the dictionary 300 were a TYPE protocol attribute dictionary, for example, a first predefined value 315 might contain the value GET and a 1 or 0 in the corresponding field indicator 320 would indicate whether the corresponding packet is a GET request. In one embodiment, a 1 value in a field indicator bit 320 is an affirmative indicator with respect to the corresponding predefined value while a 0 value is a negative indicator with respect to the corresponding predefined value (e.g., the packet is either not of the value in the predefined value or the field is not applicable). In another embodiment, a 0 in the applicable field indicator is an affirmative indicator and a 1 in the applicable field indicator is a negative indicator.
Dictionaries 300 define an association between binary values and corresponding attribute values of a packet. In this manner, dictionaries 300 may be used to encode a packet by creating a set of binary values, each of which has a meaning defined by a corresponding dictionary, that is representative of or symbolic of a corresponding packet. Similarly, each packet in a set of packets representing a particular network flow could be encoded using the dictionaries such that the resulting encoded symbols use significantly less storage than would otherwise be required for storing the original information or data. It will be appreciated by those skilled in the art, that while the protocol attributes may vary by applicable protocol, predefined values 315 of each applicable dictionary reflect data or information that is common to the applicable protocol.
The depicted implementation of dictionary 300 includes a one to one correspondence between the predefined values 315 and the field indicators 320. In other embodiments, the number of predefined values 315 may exceed the number of field indicators 320. For example, an embodiment (not depicted) of dictionary 300 may employ 2-bit field indicators 310 to identify one of four corresponding predefined values 315 (i.e., 00 corresponding to the first of such predefined values, 01 corresponding to the second of such predefined values, 10 corresponding to the third of such predefined values, and 11 corresponding to the fourth of such predefined values). The ratio by which the number of predefined values 315 may exceed the number of field indicators 320 can be manipulated by employing appropriate encoding schemes.
Although dictionary 300 and the other dictionaries illustrated below are depicted as they would appear or exist at a particular point in time, dictionary 300 is preferably implemented as a dynamic dictionary having a format and/or structure capable of changing with time to reflect additional knowledge about the content of captured packets. As an example, the structure of dictionary 300 may initially define N categories of packet types, with one of the N packet types representing a “miscellaneous” category. After a period of time has elapsed and a greater number of packets have been received, analysis of the packets may reveal that an undesirably large percentage of packets were categorized in the miscellaneous category. In response, dictionary 300 may be altered, in some embodiments, to add one or more additional categories based on an analysis of the miscellaneous packets. Conversely, dictionary 300 may contract over time to achieve higher ratios of compression if the data supports it. If, for example, analysis of large amounts of packet data reveals a strong correlation between the value in a first portion (e.g., byte) of a packet and the value in another portion of the packet, a single encoding may be used to represent both portions.
Before returning to
Returning to
Returning to
Consider the portion of an HTTP request depicted in Example 1 below.
Assuming that HTTP is a recognized protocol, the HTTP version 1.1 protocol for this packet is validated. In the example implementation presented here, where there are three attributes of interest, a value of {GET/no URL/HTTP 1.1} is extracted and encoded as binary 1000 0001 according to dictionary 500 (where field indicator 511 is the least significant bit and field indicator 518 is the most significant bit according to dictionary 500 for the TYPE attribute), a value of {www.anysite.com} is extracted and encoded as binary 1010 0000 according to dictionary 600 where field indicator 611 is the most significant bit and field indicator 618 is the least significant bit, and a value of {MOZILLA/5.0} is extracted and encoded as binary 0100 according to dictionary 700 where field indicator 711 is the most significant bit and field indicator 714 is the least significant bit for the USER/AGENT protocol attribute.
Continuing with the preceding example, an exemplary HTTP response generated following the request depicted in Example 1 above is reflected in Example 2 below.
For purposes of storing this response in a form that would enable one to simulate or otherwise recreate the packet later, the attributes of interest include the response code (in this example, “302”), the server type (in this example, “Microsoft-IIS 6.0”), the content type (in this example, “text/html”) and the content-length (in this example, “127 bytes”). Analogous to the manner in which the request is encoded as described above, one or more dictionaries may be used to encode the response.
Method 100 as depicted in
In one embodiment, the second request is encoded using the same dictionaries as the initial request. In many cases, however, the second request is likely to be different from the initial request in only one attribute (e.g., the targeted URL in the case of a redirection response). Some embodiments may take advantage of the similarity between the initial request and the second request by encoding the second request with a “change encoding” in which the attribute that differs from a previous packet is indicated and the new value of the attribute is appended to the change encoding. An exemplary CHANGE CONTROL dictionary 900 suitable for encoding change packets is depicted in
After the packet and flow encodings are generated and stored, a suitable application program may access the stored encodings for a variety of reasons. In one application, the stored encoding is accessed for purposes of analyzing and reporting statistics about the packets that are represented by the data. These statistics might include, as examples, the percentage of packets that are GET requests, the percentage of requests issuing from a specified host, etc. This application would have access to and an understanding of the dictionaries that would enable the program to interpret the stored encodings. The application may further include the ability to modify the stored encodings. This ability, for example, would enable a user to alter the composition of packets that are represented by the stored encodings. The ability to edit the stored encoding would preferably include an intervening graphical user interface that would present data regarding the packets to the user to enable the user to edit the stored encodings in a readable format. The user would be able to change a GET request to a POST request (for example) by replacing the text “GET” with “POST” in an appropriate field of the GUI thereby relieving the user of needing to have an understanding of the bit-by-bit or other implementation of the encodings.
In still another application, the stored encodings are retrieved and decoded for purposes of simulating packet traffic on a computer network. In this application, one or more data processing devices connected to a network and having access to the stored encodings and the dictionaries retrieve stored encodings and decode them using the dictionaries to generate protocol compliant packets from the stored encodings. Where the protocol compliant packets require content or other data that is not captured in the encoded attributes, the data processing device(s) may insert random or “dummy” data. This application is suitable for simulating packet traffic on a computer network for purposes of testing network equipment.
In some embodiments, retrieving stored packets, whether encoded or not, for purposes of transmitting emulated packets may include modifying selected information in the packets. More specifically, selected information in a packet may be specific to the environment in which the packet was captured and the time when the packet was captured. This type of information is referred to generally herein as environmental packet data or, more simply, environmental data. In some cases, merely reproducing these environment and time sensitive portions of a packet is inconsistent with the goal or simulating real world traffic on the network. As an example, many protocols implement the concept of “time to live” (TTL). TTL is a field in a packet, set by a protocol stack when a packet is created, that provides a mechanism for terminating packets that are “lost.” Each time a packet traverses a network hop, the corresponding router or other network device decrements the packet's TTL value. If the TTL value reaches zero, the packet is terminated, deleted, or otherwise eliminated. In this manner, a packet that would otherwise bounce back and forth in an endless loop dies eventually. In the context of packet capture and emulation, however, it is not necessarily desirable and may well be undesirable to reproduce the TTL of a captured packet as the TTL for the emulated packet. If a packet is captured when its TTL value is close to zero, replicating the same TTL value in the emulated packet might result in the immediate termination of the emulated packet. In one aspect, encoding application 51 and data encoder 10 include functionality that substantively modifies packet data, as opposed to merely compressing or encoding data, on a selective basis to reflect the reality that emulated packets are transmitted in a different time and context than the captured packets.
Other examples of packet information that may require substantive modification at transmission time include information relating to firewalls and network address translation (NAT) information. In many environments, firewalls convert that IP address of an originating device to a “generic” IP address that is used for the entire firewall protected domain. When a gateway receives an inbound packet, the NAT information enables the gateway to associate the packet with the appropriate source. In the context of capturing and later transmitting emulated packets, however, it may be desirable to eliminate the NAT effects and restore the captured packets to indicate their original IP addresses. As another example, packets may include timestamps and checksum values that will clearly be incorrect if merely reproduced in the emulated packet and the preferred embodiment of encoding application is enabled to generate contemporary timestamp and checksum values when the emulated packet is transmitted.
More generally, as depicted in the flow diagram of
In some embodiments, transmitting portion 170 of encoder application 51 may include decoding, modifying, and transmitting packet data in a manner that preserves packet ordering and/or packet pacing. There may be applications in which the order and/or the pacing of packets effects network performance (e.g., latency), functionality, costs, results, or some other parameter of interest. In such cases, encoding method 100 (of
It should be appreciated that portions of the present invention may be implemented as a set of computer executable instructions (software) stored on or contained in a computer-readable medium. The computer readable medium may include a non-volatile medium such as a floppy diskette, hard disk, flash memory card, ROM, CD ROM, DVD, magnetic tape, or another suitable medium. Further, it will be appreciated by those skilled in the art that there are many alternative implementations of the invention described and claimed herein. It will be apparent to those skilled in the art having the benefit of this disclosure that the present invention contemplates the processing and encoding of network flows so that the encoded results accurately emulate the original network flows, but can be stored in significantly less memory than would otherwise be required for storing the original network flows. Once encoded, characteristics and attributes of the stored network flows may be examined and, if desired, manipulated to facilitate different network flows to be emulated. The stored network flows may be decoded and transmitted for purposes of testing network components. It is understood that the forms of the invention shown and described in the detailed description and the drawings are to be taken merely as presently preferred examples and that the invention is limited only by the language of the claims.