The invention relates to the field of data center monitoring. More specifically it relates to a system and method for monitoring the network traffic and load in a data center.
The increased exchange of information through telecommunication networks has led to an increased load in data centers. Heavy requirements are put on the data centers with regard to speed in data retrieval, availability of the data center services, security, and data volumes. These requirements are reflected in service level agreements.
Existing monitoring solutions either focus on the application layer or the platform layer. In particular, monitoring solutions at the application layer, such as Google Analytics and Crazy Egg, figure out which part(s) of the application are of most interest to end users.
Monitoring at the platform layer tracks server and network load and/or power consumption, see for example Zabbix, Nagios, Hyperic (VMware), etc.
U.S. Pat. No. 8,122,453 B2 (IBM) discloses a method and system for managing resources in a data center. This patent monitors current application behavior and resource usage, makes a prediction about future application requirements and performance, and makes changes to the application environment to meet a given performance level.
Two typical prior art monitoring solutions are illustrated in
Therefore, there is still room for improvement in data center monitoring systems that identify whether service level agreements are met and that can identify eventual bottlenecks in a data center and from which the results can hint towards possible architectural improvements of the data center.
It is an object of embodiments of the present invention to provide good systems and methods for monitoring the load and efficiency of a data center.
It is an advantage of embodiments of the present invention that application-level monitoring information and platform-level monitoring information are coupled in an efficient way. This effectively leads to a holistic monitoring solution for a broad range of people, ranging from technicians, system administrators, business developers, marketeers, up to upper-management.
It is an advantage of embodiments according to the present invention that the packets are reduced in size. It is an advantage of embodiments of the current invention that only a very limited part of the packet information is retained. The packets containing strings, for example a host name, a URL, are processed by the packet analyzer and stored as a request type being an integer number.
It is an advantage of embodiments of the present invention that the whole network stream is captured and analyzed, thus avoiding sampling which may result in inaccurate analysis being provided. In order to have a good or optimum correlation between network traffic and resource usage, it is important that the full network stream (e.g. incoming and outgoing traffic) is analyzed, as sampling may result in overlooking some events, whereby such events may correspond with critical events regarding network traffic and resource usage.
It is an advantage of an embodiment of the present invention that it enables intra-application monitoring. Meaning that the resource consumption can be shown for each request type defined in the embodiment according to the present invention. Such request types may correspond with different components in the application, thus allowing a more detailed analysis of the application resource use. This is as opposed to some other tools that typically cannot calculate the resource consumption for different request types and only show application-wide resource monitoring. In other words, it is an advantage of embodiments of the current invention that fine-grained performance and business level information monitoring inside an application (intra-application level) can be obtained. Through deep packet inspection it is possible to identify application level and intra-application level information. Therefore different type of requests and subsequent responses can be identified. Moreover it is possible to obtain accurate information on the load the individual requests/responses pose on the data center, providing useful information for optimizing the application or the way it is processed in the data center. It is an advantage of embodiments of the present invention that by using request/response pairs per type, an efficient monitoring can be performed, thus allowing to analyze all captured network packets rather than sampling only a portion thereof.
It is an advantage of embodiments of the current invention that no complex agent is required on each server. Typically, the complex agent processing the data and performing the analysis of the application level resource use, will be the agent that requires the largest efforts from e.g. system administrators for installing, updating and with respect to runtime overhead. According to embodiments of the present invention, no such complex agent needs to be installed on each server, but a simple agent can be used which is less demanding e.g. for installing/updating. The simple agent can be a built-in SNMP agent that comes with the operating system. In this way, installation/updating and runtime overhead can be reduced. Furthermore, system administrators typically may be reluctant to install additional software on production servers.
It is an advantage of embodiments of the present invention that it is not required to send monitoring information outside the data center, which improves scalability, energy consumption, data protection and/or decreases security issues.
It is an advantage of embodiments of the present invention that system administrators and technicians can use the monitoring information to understand how request latency (and user satisfaction) relates to resource needs in the data center. Business developers will find the monitoring solution valuable to understand how application features relate to cost and revenue.
It is an advantage of embodiments of the current invention that the solutions are scalable towards an increasing size of the data center both with regard to the number of servers in the data center as with regard to the amount of network traffic in the data center. In embodiments according to the present invention it was possible to monitor 260 servers using 1 monitoring server at 50% of its capacity. Existing monitoring systems that use complex agents have an overhead between 1 and 10% on each monitored server. Embodiment of the present invention can thus result in a far more efficient system.
The above objective is accomplished by a method and device according to the present invention.
The present invention relates to a system for monitoring network traffic and resource usage in a data center, the system comprising a node comprising an input terminal, e.g. an input port, configured for capturing network packets, a processor configured for running a packet analyzer programmed for analyzing the network packets whereby packets are classified in request/response pairs per type based on deep packet inspection, and a memory configured for storing the request/response pairs,
The input terminal may capture substantially all network packets of the network traffic and said substantially all network packets may be processed by the packet analyzer.
The network packets first may be filtered based on packet information and the packet analyzer only processes the filtered packets.
Only the HTTP packets may be passing through the filter, filtering on the TCP port data, and the packet analyzer may parse URL and host information from the filtered packets.
Response times of several request of the same type may be combined in a latency histogram.
The packet analyzer may process the packets in order of arrival.
The packet analyzer may process the packets classifying them according to user defined request types.
Data in the packets may be aggregated by the system.
The packet analyzer may store only 10-100 requests per second per request type completely for retrieving full URL and host information.
The system furthermore may comprise a monitoring unit programmed for displaying the data center load versus request/response pair rate.
The system may be implemented as a computing device.
The system may be implemented as an application.
The correlation module may be programmed for determining the relationship between the number of requests and the resource utilization.
The correlation module may be programmed for determining information indicative of how many resources each type of request/response pair uses.
The request/response pairs may relate to features of the application or defined criteria about an application and wherein the correlation module provides correlation information between the network traffic and the resource usage at an intra-application message level.
The present invention also relates to a method for monitoring network traffic and resource usage in a data center, the method comprising the following steps, capturing network packets entering or leaving the data center by an input terminal of a node,
The captured network packets may be filtered based on packet information and whereby the packet analyzer only processes the filtered packets.
Only the HTTP packets may be passing through the filter in the filter step, filtering on the TCP port data, and the packet analyzer may parse URL and host information from the filtered packets.
The method may include combining response times of several requests in a latency histogram.
In at least some of the steps the packets may be processed in order of arrival.
The present invention also relates to a data carrier comprising a set of instructions for, when executed on a computer, monitoring network traffic and resource usage in a data center according to a method as described above. The data carrier may be any of a CD-ROM, a DVD, a flexible disk or floppy disk, a tape, a memory chip, a processor or a computer.
Particular and preferred aspects of the invention are set out in the accompanying independent and dependent claims. Features from the dependent claims may be combined with features of the independent claims and with features of other dependent claims as appropriate and not merely as explicitly set out in the claims.
These and other aspects of the invention will be apparent from and elucidated with reference to the embodiment(s) described hereinafter.
The drawings are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes.
Any reference signs in the claims shall not be construed as limiting the scope.
In the different drawings, the same reference signs refer to the same or analogous elements.
The present invention will be described with respect to particular embodiments and with reference to certain drawings but the invention is not limited thereto but only by the claims. The drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes. The dimensions and the relative dimensions do not correspond to actual reductions to practice of the invention.
Furthermore, the terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a sequence, either temporally, spatially, in ranking or in any other manner. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other sequences than described or illustrated herein.
Moreover, the terms top, under and the like in the description and the claims are used for descriptive purposes and not necessarily for describing relative positions. It is to be understood that the terms so used are interchangeable under appropriate circumstances and that the embodiments of the invention described herein are capable of operation in other orientations than described or illustrated herein.
It is to be noticed that the term “comprising”, used in the claims, should not be interpreted as being restricted to the modules/terminals listed thereafter; it does not exclude other elements or steps. It is thus to be interpreted as specifying the presence of the stated features, integers, steps or components as referred to, but does not preclude the presence or addition of one or more other features, integers, steps or components, or groups thereof. Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.
Similarly it should be appreciated that in the description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Where in embodiments of the present invention reference is made to an application, reference is made to a functionality (or a set of functionalities) exposed to a user that can be accessed from a network. Where in embodiments of the present invention reference is made to a request, reference is made to a network packet received from a user that asks to the application to perform an action. Where in embodiments of the present invention reference is made to a response, reference is made to a network packet send to the user containing the result of the action in the application. Where in embodiments of the present invention reference is made to a request type, reference is made to a specific functionality or action of the application, defined by the fields in a request.
Where in embodiments of the present invention reference is made to deep packet inspection, reference is made to the act of gathering information about a network packet by inspecting the full packet and reconstructing protocols thereof. It relates to the analysis of transportation packets whereby the data of the packet is inspected. Such data may comprise application level data, thus identifying somehow aspects of the application that used. For example within an http application the packets can be subdivided into requests and responses whereby the requests and responses can be even further subdivided in types, a response of a certain type corresponding with a request of the same type. The requests and responses typically may be correlated to features of the application or defined criteria about an application.
Where in embodiments of the present invention reference is made to network traffic, reference can be made to an amount and/or a type of traffic on a particular network. Monitoring network traffic may include monitoring bandwidth management of a network.
Where in embodiments of the present invention reference is made to resource utilization, reference is made to the usage of resources e.g. CPU, disk space, memory, network, etc. on a server.
Where in embodiments of the present invention reference is made to a simple agent, reference is made to an agent that periodically reads the kernel statistics about the resource usage of the machine.
Where in embodiments of the present invention reference is made to a complex agent, reference is made to an agent that periodically reads the kernel statistics about the resource usage of the machine, but also uses hooks or instrumentation to gather information about the services that run on the server.
Where in some embodiments of the present invention reference is made to a node, reference is made to a data equipment, e.g. active hardware or software device, attached to a network and capable of sending, receiving or forwarding information over a data channel. An example of a hardware device may be a host computer, such as for example a router, workstation or server. Such a hardware device may run some software or firmware. Alternatively, reference also can be made to a software device in a network, e.g. when considering a virtual machine. The node and/or the processor and/or the packet analyser configured for analyzing packets and/or for receiving resource usage information from at least one server and/or configured for correlating the request/response pairs information with the resource usage, may be positioned separate from the data servers in the data center providing functional processing of the data. The node, processor, packet analyser or other modules may be running on a monitor, separate from the data servers.
In a first aspect, the present invention relates to a system for monitoring network traffic and resource usage in a data center, e.g. for a specific application. It thereby is an advantage of embodiments of the present invention that for applications, it can be determined whether latency problems are caused by network traffic issues and/or resource usage issues, thus allowing to optimize the application or the way it is processed in the network and/or the data center.
According to embodiments of the present invention, the system comprises a node comprising an input terminal for receiving or capturing network packets. The node furthermore comprises a processor configured for running a packet analyzer programmed for analyzing the network packets whereby packets are classified in request/response pairs per type based on deep packet inspection. Such request/response pairs may be combined with time stamps. They may be stored in a memory. The system also comprises a collector module configured for collecting the request/response pairs and for receiving resource usage information from at least one node. The at least one node may therefore be equipped with a simple agent adapted for providing resource usage information to the receiver module configured for receiving usage information, e.g. a central processor. The system furthermore comprises a correlation module programmed for correlating the request/response pairs information with the resource usage. The system according to embodiments of the present invention may be hardware or software implemented.
By way of illustration, embodiments of the present invention not being limited thereto, an exemplary system according to particular embodiments will now be described with reference to
According to embodiments of the current invention the system 100 comprises a node 101. The node comprising an input terminal 102 configured for capturing network packets. Client requests enter the data center at the load balancer. In embodiments according to the present invention all network traffic is duplicated to the node 101 comprising the input terminal 102. In embodiments according to the present, typically the node 101 may comprise a network interface that receives the data send and received by the load balancer. The load balancer thereby typically may be a separate node, for which the network interface is mirrored to the network interface on node 101.
The same node 101 also comprises a processor 103 configured for running a packet analyzer programmed for analyzing the network packets whereby packets are classified in request/response pairs per type based on deep packet inspection and whereby the request/response pairs together with time stamps are stored. An example of such a packet analyzer is described later with reference to
According to embodiments of the current invention the system 100 also comprises a receiving module 111, e.g. an element comprising processing power, configured for receiving resource usage information from at least one server. Such a server may be considered part of the system or alternatively, the system may be adapted for receiving information therefrom. The receiving module 111 configured for receiving resource usage information may be part of a central processor. It may be referred to as an external monitoring system. Such a processor may be the same as the processor 103 configured for collecting the request/response pairs through the packet analyzer, although the latter is not strictly necessary. The at least one server 106 from which the information may be received typically may be running a simple agent for transferring resource usage towards the receiver module configured for receiving resource usage information. The simple agent may be considered part of the system 100, or the system 100 may be adapted for receiving information from the simple agent.
The system also comprises a correlation module 112, e g running at the node 101, programmed for correlating the request/response pairs information with the resource usage. It typically may run together with the packet analyzer and typically may run independent of the resource monitoring system 111.
In order to support high request rates, advantageously only very fast packet inspection may be used. This makes the traffic analysis scale to high request rates. During analysis, the data thereby advantageously is immediately classified and stored in an efficient way. A data storage 113 may be provided.
In case the network packet was a response, it is verified whether the request was the first packet of the response (e.g. starts with HTTP). If so the time stamp of the response is updated. Next the entry is removed from the map and transmitted to the receiver module configured for receiving resource usage information or a processing element. The element or a processing element aggregates these results every predetermined period, e.g. every x seconds, whereby x is in the range between 1 s and 300 s preferably between 1 s and 60 s. The result is a throughput and a latency per request type. In case the packet is not the first packet of the response, the packet is ignored. Gathering only limited information allows for great reduction in storage capacity for storing such information. In one particular example, embodiments of the present invention not being limited thereto, only the URL and hostname is stored in the first step. This is typically sized less than 200 bytes. Assuming an http page of about 5 kbytes, this means a 25 times reduction of data in the step. Using aggregation, eventually only a fixed amount of data (e.g. latency and bandwidth corresponding with about 40 bytes) is stored, independent of the amount of traffic being e.g. 1 kb/sec or 1 Gb/sec. A memory 104 providing this storage capacity for storing the request/response pairs may be comprised by the node 101 mentioned before.
In embodiments according to the present invention, to quantify user perceived performance, response times of all requests are measured. Response time data of several requests can be combined into an efficient data structure, called a latency histogram, and such information may be outputted to the user. The latency histogram is stored may be store in the memory 104 in the node 101 or in the central processor—if it does not form part of the node 101.
The packet analyzer processes the packets entering or leaving the data center. It is an advantage of embodiments according to the present invention that a high throughput of network packets can be analyzed while keeping the required information.
In order to achieve this, embodiments of the current invention typically may limit the analysis of the network packets by the packet analyzer. One example thereof may be analyzing the network packets taking into account one or more, preferably all of the following principles:
Moreover in order to be able to analyse networks packets at a high throughput, the storage needs, which need to be foreseen by the memory 104 in the node 101, are reduced to a minimum by following steps:
According to embodiments of the present invention the network traffic and the resource usage for an application are both monitored. The input terminal 102 thereby captures all network packets (monitoring the network traffic). The input terminal 102 is illustrated in
Embodiments according to the present invention help understanding what is going on in the application by correlating the network information with the resource usage of all hardware components.
Said embodiments heavily rely on the network capturing by the node 101 comprising the input terminal 102 and the packet analyzer: this includes categorizing each request in a predefined request type and measuring the latency for each request-response pair. In embodiments according to the present invention the whole network stream is captured and analyzed as opposed to other inventions where only a limited set is ‘sampled’. It is an advantage of embodiments of the current invention that the embodiments are scalable to high throughput levels. Indeed embodiments according to the present invention focus on large applications. Therefore embodiments according to the present invention identify, by means of the packet analyzer, the request type as soon as possible using the least amount of data and packet reconstruction.
Embodiments according to the present invention enable the correlation between the network traffic and the resource usage since a full analysis of the incoming and outgoing traffic to the application is done by the packet analyzer. This requires a non-sampled, fast and accurate network analysis at the application level. As opposed to the prior art, the current invention differs from other network capturing tools because embodiments of the current invention apply deep packet inspection, don't reconstruct TCP streams and immediately categorize the stream in a predefined request type. The correlation module relates the network traffic with the resource usage.
The correlation module, which may be running on a processor, correlates the resource usage with the fine-grained request information to determine how many resources each type of request uses. Both sets of data are aligned in time and linear regression is used to determine the component of each request type in the resource usage. The system therefore also may comprise an output module configured for outputting the obtained results.
The fine-grained request information giving the number of requests for each request type per second is information coming from the packet analyzer.
The resource usage is information coming from the at least one server 106 running a simple agent for transferring resource usage towards a receiving module for receiving resource usage, e.g. a central processor.
Both the fine grained request information as well as the resource utilization are time stamped allowing the correlation module to correlate them. By way of illustration the graph in
In certain embodiments according to the present invention, the system comprises an interface for visualizing the measurement results. Moreover, by automatically aggregating data, the number of data points that are necessary to visualize a certain time period is kept within limits. The data can be aggregated in intervals. For example, 10 second data can be aggregated over 1 minute, 5 minutes, 15 minutes, 60 minutes, 3 hours, 6 hours, 12 hours, 24 hours. This way if a user is looking at data for a long period, one can use a higher interval (for instance 15 minutes instead of 10 seconds), thus limiting the number of data points shown. Finally, servers are grouped into logical groups for easier visualization of the data. The system may comprise an output module for outputting the obtained results.
In a second aspect, the present invention relates to a method for monitoring network traffic and resource usage in a data center. The method comprising the following steps of capturing network packets entering or leaving the data center by an input terminal of a node, analyzing of the network packets by a packet analyzer running on a processor whereby the network packets are classified in request/response pairs per type and this based on deep packet inspection, obtaining resource usage information of at least one server at a receiving module for receiving resource usage information, e.g. a central processor, and correlating the request/response pairs information with the resource usage e.g. in a processor. By way of illustration, embodiments of the present invention not being limited thereto, an exemplary method is shown in
In a first step 810 the network packets entering or leaving the data center are captured by the input terminal 102 of node 101. In embodiments according to the present invention all network packets entering or leaving the data center can be captured by the input terminal 102, instead of sampling the network traffic.
In certain embodiments according to the present invention, a next step 820 is included. This step 820 filters the incoming packets using a filter based on the content of the packets. In an exemplary embodiment of the current invention the filtering is done based on the header of the packet, more specifically based on the overhead of the transport layer. For example the filtering can be done on a TCP port filtering out the HTTP messages.
In a next step 830 the data of the captured, eventually filtered, packets is analyzed through deep packet inspection. Thereby the packets can be assigned a user defined type and timestamp. In embodiments according to the present invention assigning a type to a network packet is based on application layer data. The types are user defined. By making the difference in time stamps between requests and responses of the same type, the response delay can be calculated.
In a next step 840 the data (request/response types and timestamps) can be combined in a latency histogram.
On the resource monitoring side, a first step 850, comprises the collection and transmission of the server system statistics by a simple agent towards the element for receiving resource usage information or a processor. All this information is collected by a central processor for further processing in step 850.
In step 870, according to embodiments of the current invention, the fine grained request information as well as the resource utilization are correlated. This allows to correlate the network traffic with the resource usage even down to an intra-application message level.
The above described system embodiments for monitoring network traffic and resource usage in a data center may correspond with an implementation of the method embodiments for monitoring network traffic and resource usage in a data center as a computer implemented invention in a processor. One configuration of such a processor may for example include at least one programmable computing component coupled to a memory subsystem that includes at least one form of memory, e.g., RAM, ROM, and so forth. It is to be noted that the computing component or computing components may be a general purpose, or a special purpose computing component, and may be for inclusion in a device, e.g., a chip that has other components that perform other functions. Thus, one or more aspects of the present invention can be implemented in digital electronic circuitry, or in computer hardware, firmware, software, or in combinations of them. For example, each of the method steps may be a computer implemented step. Thus, while a processor as such is prior art, a system that includes the instructions to implement aspects of the methods for monitoring network traffic and resource usage in a data center for a specific application is not prior art.
The present invention thus also includes a computer program product which provides the functionality of any of the methods according to the present invention when executed on a computing device.
In another aspect, the present invention relates to a data carrier for carrying a computer program product for monitoring network traffic and resource usage in a data center. Such a data carrier may comprise a computer program product tangibly embodied thereon and may carry machine-readable code for execution by a programmable processor. The present invention thus relates to a carrier medium carrying a computer program product that, when executed on computing means, provides instructions for executing any of the methods as described above. The term “carrier medium” refers to any medium that participates in providing instructions to a processor for execution. Such a medium may take many forms, including but not limited to, non-volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such as a storage device which is part of mass storage. Common forms of computer readable media include, a CD-ROM, a DVD, a flexible disk or floppy disk, a tape, a memory chip or cartridge or any other medium from which a computer can read. Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution. The computer program product can also be transmitted via a carrier wave in a network, such as a LAN, a WAN or the Internet. Transmission media can take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications. Transmission media include coaxial cables, copper wire and fibre optics, including the wires that comprise a bus within a computer. Method embodiments of the present invention also may be implemented as an application that can be run. Such an application may be presented via a user interface, e.g. a graphical user interface, and may provide the user with output indicative of the network traffic and resource usage, with respect to a particular other application run. The output may comprise latency information.
The present invention furthermore relates to a datacenter embedded in a network, the datacenter comprising a system for monitoring network traffic and resource usage of the datacenter, e.g. with respect to a particular application or parts thereof run on the datacenter and in the network. The system thereby corresponds with a system as described in the first aspect of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
13185462.2 | Sep 2013 | EP | regional |