Modern data centers may contain tens or hundreds of thousands of computers, which can also be referred to as nodes. Each node may contain a port, such as a serial port, through which log data may be sent. Log data is typically information related to the node that may be analyzed to determine node performance or to debug errors that may have occurred on the node. In many data centers, the log data from each node may be collected on a small number of logging servers. Thus, log data from many nodes may be retrieved without having to access each node individually.
Although providing central log servers to aggregate log data from many nodes provides an efficient way of gathering log data without having to access each node individually, such aggregation is not without problems. For example, log data is typically sent out of a serial port on a node. In order for a log server to gather log data from each node, a serial cable must be routed between each node and the log server. Given the ever increasing density of nodes in a standard rack, the burden of such cabling becomes overwhelming. For example, there are current data center cartridge architectures that allow for 45 cartridges with four nodes per cartridge per enclosure, with 10 enclosures per rack. This density translates to 1800 nodes per rack, which in turn would necessitate 1800 serial cables. As it would be unreasonable to have 1800 serial ports on a log server, additional equipment, such as serial expanders would be needed.
To partially overcome this problem, the virtual serial port was created. Using a virtual serial port, log data that would normally have been sent over the serial port is sent over a network connection. For example, each node may establish a connection with a log server over a network. For example, the network may be an Ethernet network that connects all of the nodes and log servers within a data center. Log data that would normally be output over a serial port may be placed into a packet and sent over the connection established with the log server. Because the data is traveling over a network, the data connection may be encrypted to enhance security. As should be understood, use of a network topology eliminates the need to have a specific cable between each node and the log server.
Although use of a virtual serial port resolves some of the issue related to gathering log data at a log server, the virtual serial port itself creates additional problems. For example, because a connection must be established between each node and the log server, each node must be individually configured with the network address of the log server. In addition, a connection must be established and maintained between each node and the log server. Although this may not be a large issue at the node, the same cannot be said about the log server. Given the example density above, a single rack may need 1800 connections to be established with the log server. Further exacerbating the problem may be the use of encryption on each connection. If encryption is used, the overhead for encrypting and decrypting the log information sent over the connection may be excessive.
The techniques described herein overcome these problems by aggregating log data prior to sending to a log server. Each node may send log data over a virtual serial connection to an aggregation node. The aggregation node may be local to the enclosure and/or rack, in a trusted domain, such that encryption is not needed between the node and the aggregation node. The aggregation node may establish a secure connection, such as a Secure Shell (SSH) connection with the log server. The log data received from each node by the aggregation node may be sent to the log server. Thus, there is no longer a need for each node to establish a secure connection to the log server.
In order to overcome the problem of configuring each node with the aggregation node, a self discovery mechanism may be used. In one example implementation, each node may listen for a broadcast message from a aggregation node. Once the broadcast message is received, the node may retrieve the address of the aggregation node from the message. In an alternate example implementation, each node may broadcast a request message asking for the network address of an aggregation node. An aggregation node may respond, and the node may store the address of the aggregation node. In either case, the address of the aggregation node need not be preconfigured into each node.
Because all log data is being sent to a single log server, it may be desirable to be able to identify the data sent from each node. Log data from a node is typically a line of text, which may be referred to as a log line. In some example implementations, an identifier is appended to each log line, such that the particular node that generated the log line can be identified. The identifier may be a unique attribute, such as an IP address or a node name. The particular form of the identifier is relatively unimportant, so long as it is understood to uniquely identify one node in the data center. In some example implementations, the node identifier may be appended to each log line by the node sending the log line, while in other example implementations, the node identifier may be appended by the aggregation node. These techniques are described in further detail below and in conjunction with the appended figures.
Log aggregation node 120-1 may be a node that aggregates log lines from the nodes 110-1 . . . n. In some example implementations, log aggregation node 120-1 may be a node that performs tasks that are disjoint from the workloads performed by the nodes 110-1 . . . n. In other example implementations, log aggregation node 120-1 may perform the same tasks as nodes 110-1 . . . n, but performs the log aggregation tasks in addition. For example, a rack may contain multiple nodes. In some example implementations, one node may be selected to perform the log aggregation function, in addition to processing normal workloads. In other example implementations, the log aggregation node may be responsible for log aggregation, but is not responsible for processing general workloads.
It should be noted that there may be a plurality of log aggregation nodes. As shown in
In operation, each node 110-1 . . . n may first identify its associated log aggregation node. In one example implementation, each node may broadcast a message to all elements on the network, requesting identification of the log aggregation server. The log aggregation server that is to be associated with the node sending the request may then respond, indicating that it is the log aggregation server to be used by the requesting node. In other example implementations, each log aggregation server may broadcast a message to all other elements indicating that it has log aggregation capabilities. Nodes that receive the broadcast message may then choose to use the broadcasting node as the log aggregation node.
Regardless of implementation, each node determines the address of the log aggregation server that is to handle the log line aggregation function for the node. The node may then store the address of the log line aggregation server. The node may then establish a connection with the log aggregation node. Typically, the log aggregation node and the nodes may all be within the same trust domain, such that a simple, insecure connection may be established. However, the techniques described herein are equally applicable if a secure connection is established between the node and the log aggregation node.
When a node wishes to send a log line to the log server, the node sends the log line to the log aggregation node over the established connection to the log aggregation node. In some example implementations, the node appends a node identifier on the log line. For example, the node identifier may be an address of the node generating the log line. As another example, the node identifier may be a node name. In other example implementations, the node identifier is appended by the log aggregation node. The purpose of the node identifier is to determine the node that generated the log line, as will be explained below.
The log aggregation node may establish a secure connection with the log server. The log aggregation node and the log server may not be in the same trust domain, and as such it may be prudent to use a secure connection. Once the log line has been received by the log aggregation node, and the node identifier has been appended (either by the node or by the log aggregation node), the log aggregation node may forward the log line to the log server. In some example implementations, the log aggregation node may forward log lines upon receipt, while in other example implementations the log aggregation node may buffer log lines and send them to the log server once the buffer is full. Regardless of implementation, what should be understood is that each node need not create a connection, much less a secure connection, with the log server. As such, the processing load on the log server is reduced.
Upon receipt of a log line forwarded form a log aggregation node, the log server may simply append the received log line to a log file (not shown). In some example implementations, the log server may maintain a separate file for each log aggregation node, while in other example implementations, the log server may maintain a single file for log lines from all log aggregation nodes. When a system user wishes to analyze log lines from a single node, the appropriate log file may be retrieved from the log server. The file may then be filtered based on the node identifier of interest, the node identifier having been appended to the log lines as described above. As such, the log lines form an individual node may then be retrieved and analyzed.
Thus the enclosure may support a plurality of nodes 210-1 . . . 8, 211-1 . . . n. Each of these nodes may generate log lines, as described above with respect to
The chassis manager may contain a processor 221 and a non-transitory processor readable medium 222 containing a set of instructions thereon, which when executed by the processor cause the processor to implement the techniques described herein. For example, the medium may include log line receive/append instructions 223, log line secure forward instructions 224, and log node broadcast/respond instructions 225.
In operation, just as above, the chassis managers may notify the nodes that they have log aggregation capabilities. For example, the log node broadcast/respond instructions 225 may be used to allow the chassis manager and the nodes to identify each other. As explained above, this may be through a broadcast mechanism wherein the chassis manager broadcasts its log aggregation capabilities, or it may be in a request-response mechanism, wherein the chassis manager responds to a request for log aggregation node identification. Regardless of implementation, each node may be able to identify the chassis manager to which log lines are to be sent. Again, as above, each node may establish a connection with the identified chassis manager.
As shown in
Chassis manager 220-3 may receive log lines forwarded from chassis managers 220-1,2. Chassis manager 220-3 may then use log line secure forward instructions 224 to establish a secure connection to log server 240. Chassis manager 220-3 may then forward the log lines received from chassis manager 220-1,2 to the log server. It should be noted that chassis manager 220-3 does not receive any log lines directly from and of nodes 210-1. . . 8, or 211-1 . . . n. Rather, chassis manager receives log lines indirectly through other chassis managers.
In block 330, a logged line may be sent to the log aggregation node over the established connection. Logged lines may be received from any number of different nodes over any number of established connections. The log aggregation node may then forward the logged line to a log server. As explained above, the log line may have a node identifier appended to it and the connection to the log server may be a secure connection, such as a connection provided by SSH.
In another example implementation, the process starts in block 415. In block 415, a node may send a broadcast query on a connection fabric for the log aggregation node. In other words, the node may request the log aggregation node to identify itself. In block 420, a response from the log aggregation node may be received. In block 425, the address of the log aggregation node may be stored.
In either example implementation, the process moves to block 430, in which a connection to the log aggregation node may be established. As explained above, in some example implementations, the connection need not be a secure connection, as the nodes and the log aggregation node may both be within a trusted domain, However, the techniques described herein are applicable even when the connection between the node and the log aggregation node is a secure connection.
In block 435, it may be determined if the node identifier is to be appended by the sending node (e.g. local node) or by the log aggregation node. If the node identifier is to be appended by the sending node, the process moves to block 440. In block 440, the node may append a node identification tag to each logged line. The node identification tag may be used to identify the node that sent the logged line. If the node identifier is to be appended by the log aggregation node, the process moves to block 445. In block 445, the log aggregation node may append a node identification tag to each logged line. The node identification tag may identify the node that sent the logged line.
Regardless of which node appends the node identification tag, the process moves to block 450. In block 450, the logged line may be sent to the log aggregation node over the established connection. The log aggregation node may forward the logged line to a log server over a secure communications channel.
In block 520, the first chassis manager may append to each log line a node identifier, wherein the node identifier identifies the specific node that generated the log line. As explained above, the node identifier may be used when analyzing log files on a log server to determine from which node a log line was sent. In block 530, the log lines with the appended node identifiers may be forwarded to a third chassis manager. As explained above, in some example implementations, some chassis managers may be responsible for communicating with nodes, such as the first chassis manager described herein. Other chassis managers, such as the third chassis manager, may communicate with the chassis managers responsible for communicating with the nodes, but do not communicate with the nodes themselves.
In block 630, the first chassis manager may append a node identifier to each log line, wherein the node identifier identifies the specific node that generated the log line. In block 640, the second chassis manager may similarly append the node identifier to each log line. Again, the node identifier may identify which node generated the log line.
In block 650, the log lines may be forwarded form the first and second chassis managers to a third chassis manager. The third chassis manager may not receive log lines directly from any node in the set of nodes. In other words, the third chassis manager receives log lines forwarded from other chassis managers, not from nodes themselves. IN block 660, the third chassis manager may forward the log lines to a log server over a secure communications channel.
In block 720, the instructions may cause the processor to establish a secure connection to a log server. As explained previously, the log server may not be in a trusted domain, and as such the connection to the log server may be a secure connection. However, because the connection is from the log aggregation node, instead of each node individually, a reduced number of secure connections may be needed. Thus the overhead of establishing and maintaining a secure connection is reduced. IN block 730, the instructions may cause the processor to forward the log lines from the plurality of nodes to the log server over the secure connection.
In either implementation, in block 830, the instructions may cause the processor to receive a log line from a plurality of nodes over insecure connections. As has been explained above, the nodes and the aggregation node may be in a trusted domain, such that use of insecure communications channels is acceptable. In block 840, the instructions may cause the processor to append a node identifier to each log line. The node identifier may identify the node that generated the log line.
In block 850, a secure connection to a log server may be established, As explained above, the log server and the log aggregation node may not be in the same trusted domain. As such, to ensure secure communications, a secure connection may be established between the log aggregation node and the log server. In block 860, the instructions may cause the processor to forward the log lines form the plurality of nodes to the log server over the secure connection.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2013/062352 | 9/27/2013 | WO | 00 |