The present invention relates to a data logging system, and in particular to a data logging system for managing test probes in a network testing system.
Conventional standalone network probes perform monitoring functions on Internet traffic, and tap into the communication line via a splitter or as an in-line device. The functions typically involve performing some operation on data, inside packets or frames, which is passing through a particular point in the network. Typical functions are, for example, filtering of data, capturing data, forwarding captured data, and summarizing data. Unfortunately, existing probes working as standalone devices configurable to gather data of different types in different arrangements are expensive and require a footprint usually the size of a laptop. Probes that are part of an interface converter or part of a switch or router line card have small footprints, but they have limitations because they are very specific in nature. Thus, when trying to use a probe in a confined space it is necessary to limit the functions that the probe provides in order to use a small footprint probe.
With reference to
The SFProbe 2 is an inline device, which does not require a separate network connection to deliver captured packets. Instead, the SFProbes take advantage of inter-packet gaps and unused bandwidth in a network when messages or test results have to be sent, as disclosed in U.S. Pat. No. 7,948,974 issued May 24, 2011 to Ilnicki et al, and U.S. Pat. No. 8,009,557 issued Aug. 30, 2011 to Curran-Gray et al., which are incorporated herein by reference. When an idle period is detected, a results packet is inserted into the network for routing back to the system and subsequently the destination application or tools. Accordingly, no network packets are dropped while passing through the SFProbes 2.
The PacketPortal solution examines packets at full-duplex line-rate speeds, empowering the IPD to identify packets of interest that are then copied from the network, accurately time-stamped, encapsulated into a results packet, and inserted back into the network for routing to the targeted application—all without causing loss or disruption to the original flows, as disclosed for example in U.S. Pat. No. 7,894,356, issued Feb. 22, 2011 to Mottishaw et al., which is incorporated herein by reference.
A System Manager 4 provides user management and system access through an easy-to-use, web-based graphical user interface (GUI) 5 that users can access through any compliant browser. The intuitive user interface of the System Manager 4 enables quick easy access to the features, functionality, and management of the entire system.
A Packet Routing Engine (PRE) 6 provides scalable management and control of the SFProbes 2 across the network. Currently each PRE 6 can manage and control up to 500 SFProbes 2; however, future PRE 6 will be able to support thousands of SFProbes 2. Each PRE 6 maintains network connections, state, time synchronization, encryption, and discovery, and they route captured result packets for the SFProbes 2 in their domain. Decoupling the functions of the PRE 6 from those of the central System Manager 4 lets a PacketPortal system scale to sizes never before conceived of for packet-access solutions. PRE's 6 may be synchronized with a global time source, such as a global positioning system (GPS), network time protocol (NTP), IEEE 1588 master clock, as disclosed for example in U.S. Pat. No. 7,573,914 issued Aug. 11, 2009, and U.S. Pat. No. 7,689,854 issued Mar. 30, 2010 both in the name of Ilnicki et al., which are incorporated herein by reference.
To simplify data and packet acquisition, every SFProbe 2 incorporates a protocol header parser (PHP) that automatically identifies most major protocols over virtually any network encapsulation. The PHP works in conjunction with four programmable filter banks, which may be activated in every SFProbe 2. Each filter bank may hold up to eight bidirectional independent filter patterns that define the network traffic to be captured and forwarded. Users can set up simple or complex filters using the GUI 5 from the System Manager 4, as disclosed for example in U.S. Pat. No. 7,760,663 issued Jul. 20, 2010 to Ilnicki et al, which is incorporated herein by reference.
A Packet Delivery Gateway (PDG) 8 enables one or more applications, e.g. analysis application 9a or analysis probe 9b, to connect to the PacketPortal system and receive time-aligned packets, as if they were locally connected to a monitor port or tap at the remote location. The PDG uses captured timestamps and sequence numbers from the SFProbes 2 to play aggregated streams out a monitor port. The streams maintain proper sequencing and inter-packet timing that represents what the packets experienced while passing through the remote network port. PDG's 8 can feed packets to any device or application that would normally connect to a tap, SPAN port, aggregator, mirror port or equivalent technology. The PDG 8 enables applications to reside in central locations instead of remote locations, where it may not be economically practical to deploy. Accordingly, the PDG 8 provides the ability to utilize legacy and even future probes and test systems with the PacketPortal system.
A virtual network interface card 10 (VNIC) is a software component that emulates a physical network interface card (NIC) driver and enables any Ethernet-based software application to receive feeds from a PacketPortal system via a NIC interface. The VNIC receives Packet Portal feeds, removes the transport headers and metadata to reveal the network traffic, and retransmits the original packets to the PC's network stack. The traffic is replayed using the original capture timestamps and sequence numbers to accurately represent the traffic as it was captured at the remote element. The replay may be configured to output on a specific transmission control protocol (TCP) or user datagram protocol(UDP) port from the PRE 6 to the VNIC 10. The VNIC 10 can also read captured network data files in the packet capture (PCAP) format and replay them similarly to how live traffic is processed through the PacketPortal system.
The SFProbes 2 can be located in various locations connected to a core IP network 11, such as nodes 12, switches 13, routers 14, data receivers and transmitters 15, and any other access equipment 16, e.g. DSLAM, CMTS, OLT etc.
High complexity software applications utilizing many threads of execution are difficult to analyze and debug. As the application is scaled to process more and more work, this analysis and debugging becomes much more difficult. Logging that contains information that can provide a picture of the complex interactions of all execution threads over time is needed. Application performance tuning requires that the logging is also fast enough to have very little impact on overall execution performance to allow analysis of the application running at speed. Logging may also contain information that the customer or someone trying to reverse-engineer the application should not see. The present invention logs a rich set of information in a binary record format that is obfuscated and so compact that it is very fast to write.
An object of the present invention is to overcome the shortcomings of the prior art by providing a high speed logging system that stores compacted log records in a first memory buffer in random access memory (RAM).
Accordingly, the present invention relates to a method of storing log entries of events from a plurality of network elements in a network, comprising the steps of:
a) receiving log entries at a control processor of events from a plurality of different elements positioned, the log entries grouped into threads based on a common purpose;
b) converting each log entry into a compact log record in a logging module, each compact log record comprising in binary form:
record length of the compact log record;
time stamp of data collection at the network element;
thread ID of the thread that logged each event;
category of each event;
message ID for identifying static text for a human readable log message; and
network element ID for identifying to which network element the event relates; and
c) storing the compact log records in a first memory buffer in random access memory (RAM) forming a first log file.
The invention will be described in greater detail with reference to the accompanying drawings which represent preferred embodiments thereof, wherein:
The present invention is ideally suited for managing test probes in a communications network, such as those PacketPortal SFProbes disclosed in U.S. Pat. No. 7,336,673 issued Feb. 26, 2008 to Ilnicki et al entitled Creating a Low Bandwidth Channel Within a High Bandwidth Packet Stream; and U.S. Pat. No. 7,868,780 issued Jan. 11, 2011 to Engel et al entitled System and Method for Test Probe Management, and United States Patent Application of 2009/0109973 published Apr. 30, 2009 in the name of Ilnicki, entitled Programmable Passive Probe, which are incorporated herein by reference. However, other applications that do not manage probes, could categorize log events by some other grouping that would make sense to the application. For example, an application that manages or monitors cell phone communications may want to organize them by phone number.
The software logging module 21, in accordance with the present invention, is typically stored on non-transitory computer readable media, such as a fixed magnetic disk, floppy disk drive, optical disk drive, magneto-optical disk drive, magnetic tape, or non-volatile memory, and provides logging methods so that a programmer can instrument the application code with informational and error log statements. In the embodiment illustrated in
Similar test-data signals collected for a similar purpose from each of the different SFProbes 22 are grouped into threads, as initiated by the system manager 24. Typically the module 21 is stored in conjunction with a packet routing engine (PRE) 26 or an analysis application 30, e.g. a virtual network interface card. For each logging line added by the programmer, the logging module 21 will create a compact log record, which ideally comprises essential variable data, e.g. a Time-stamp, a process or Thread ID, message Category or type, other enrichment data that is present for every logged message, Probe (or element) ID, and variable data that varies depending on the Message ID being logged, along with compacted data in the form of one or more code numbers, e.g. a Message ID, representing, e.g. full length text strings from a predetermined list of text strings. The Message ID is one of a plurality of predetermined code numbers, each corresponding to predetermined data or text string, whereby each compact log record comprises fewer bits than the corresponding full length log record, thereby requiring less memory (RAM or ROM) to store. The Time-stamp and Probe ID are formatted as well. The Event Type is a string that the log viewer knows from the binary type in the record.
Then as the code executes, each time a log statement is executed, one compact log record is added to a first one of a plurality of memory buffers (RAM) 32a to 32d.
The binary compact log file is a series of compact log event records. Ideally, each record is comprised of the following elements:
Other records can be logged into the binary log file, and some of the records can be omitted, depending on the nature of the application.
When the compact log records are to be viewed, a log file viewer program, converts the compact log records in the log file into displayable strings and displays them on a suitable GUI. To do this, the log file viewer takes the binary log file and an auxiliary mapping file that contains the mapping of Message ID to text.
Each binary log record can vary in size, i.e. record length, depending on how much variable data is included. The log viewer must process all of these binary records, and the Record Length, enables the log viewer to know how long each record is and therefore, where the next record begins.
When the first memory buffer 32a gets full, then a second different memory buffer 32b is allocated by the logging module 21 and used for future log records. In the illustrated system the buffer size and number of buffers is configurable. Depending on the size of the buffers 32a to 32d and how many log events are generated, the buffers 32a to 32d can fill in milliseconds or in minutes. The more heavily loaded the system is, the more work it is doing, the more events are generated. Example: an event may be a specific Message Type is sent to a probe 22. Another event is an Acknowledgement is received from a probe 22. The full, first memory buffer 32a is queued to a background thread that writes it to a disk file on a hard drive 33 and then puts the memory buffer 32a back into the free pool so that it can be used again at a future time. Accordingly, if only two buffers 32a and 32b are provided, the logging module 21 will alternate between them; however, if additional buffers, e.g. 32c and 32d are provided, the additional buffers can be used.
What makes the system, and in particular the logging module 21, high-speed, which is very important in debugging high speed applications, is that the compact log files are stored into RAM and flushed later in a background thread to the slower hard drive 33. Logging directly to hard drive 33 would slow the application down too much. The “real” code is high priority running full speed servicing requests to be able to manage so many probes 22 all at once. Events are written to the log buffers 32a to 32d at this high speed, with no perceptible slow-down of the “real” code. When a buffer, e.g. 32a, is full, the log code moves its write pointers to a fresh buffer, e.g. 32b, and hands the full buffer 32a to a low priority thread that writes the buffer 32a to a hard drive file. This low priority thread only gets to write when there are spare cycles not needed by the high speed threads, i.e. sometimes referred to as ‘running in the background’. Accordingly, the full files are transferred whenever the low priority thread gets some processing cycles allocated by the Operating System Scheduler of the logging module 21.
An example of a trace statement in the code is:q
APP_TRACE(INFORMATION,MESSAGE1_OpenSocketsSizes, m_RecvBuffSize, m_TransBuffSize);
Written as a binary log file to one of the buffers 33a to 33d in the following format:
Timestamp Thread id Type Message id 124928 124928
Timestamp is a numeric representation of time when the record is written. Thread id is an identifier the distinguishes one thread of execution from another. Type is a type of message, like ‘error’, ‘packet transmit’, etc—these could change depending on what software application the present invention was being used on. Message id is a number that maps back to some text string that will ultimately be displayed by the visualizing GUI (log viewer)—if the event always displays some text string, only the number is logged, not the text string. The visualize GUI has the mapping of ‘message id’ to actual text.
The compact log event record takes somewhere on the order of 24 bytes of data. Messages with no variable data take even less. Since it is written to one of the buffers 33a to 33d in raw binary format, no runtime computation is needed for formatting to human readable form.
The log viewer program is stored in non-transitory memory provided in any suitable computer device, e.g. a remote laptop or the system manager 24, and runs separate from the main application under control of a remote processor or the system manager control processor. The log viewer does not require packet portal or any of the packet portal components whatsoever; however the log viewer can be used extensively to tune and debug the PRE component 26 of PacketPortal. The log viewer is a separate application that can run wherever you are able to open a binary log file with it, e.g. binary log files can be retrieved from a PRE server, and stored on a laptop, which uses the log viewer to view the files in full form. The log viewer program takes, as input, two files. First is the compact binary log file and second is a file that contains a mapping of text message strings to message ids. Following is an example of this mapping:
485,“[CCThreadManager::OpenSocketsVersion] Recv buffer size=%d, Transmit buffer size=%d”
From these two input files, the log viewer constructs all of the human readable formatted strings.
Resulting line as viewed in log viewer:
2011-06-01 11:27:40:0023 0234874 [Info] [CCThreadManager::OpenSocketsVersion] Recv buffer size=124928, Transmit buffer size=124928
The line above includes the same information that was logged into one of the buffers 32a to 32d at runtime and later written to the disc file in the hard drive 33—timestamp, thread id, type, message and two pieces of variable data (the buffer sizes). Note that the formatted, expanded line is on the order of 140 bytes of information.
Historically, applications have constructed a full, human readable string and output it directly into a log file at runtime, which produces a huge volume of data and eats processing time in formatting and output.
In troubleshooting any software problem, a key factor is knowing what the code is doing at the time when a problem occurs. Going back and looking though an execution log is one way to see this. Logs can also be looked at when no problem is occurring just to see if any errors are being logged. Software developers can test for bad data or other unexpected conditions and log errors if any is detected.
In the PacketPortal program, the PRE 26 uses this logging almost exclusively. The PRE 26 has on the order of 50 threads of execution, and these log files have been invaluable in finding defects, performance bottlenecks, and other issues. The fact that they are active by default provides a good way to look at what was happening when a problem was encountered, which has even been possible when the PRE 26 is under stressful loads, e.g. managing 1000 PacketPortal probes 22.
The visualization in this example includes color-coding of execution threads. The threads of execution are graphed over time by thread id. Graphic patterns emerge as they are plotted over time, and visual changes in the graphic pattern will indicate a problem that should be looked into.
In the graph, the steady state of an execution thread will have a certain look to it. Every software system of threads, logging events, would likely have a different look pattern in the graph. By comparing execution threads, sections that look much different than other threads will stand out and can indicate a problem area, i.e. interruptions in the usual patterns, new patterns, etc. will visually pop out to the trained eye. This is extremely useful, because without it, changes in behavior would not be easily detectable and would require paging though tens of thousands of lines of text.
Also, the inclusion of a line dedicated to errors allows log entries tagged as errors to be located quickly. For Example: the top line is a where points are plotted for events marked with EventType=Error. This is very useful since most log events are informational. Events that are errors are logged as such. As soon as a log file is opened with the log viewer, events will be seen on the dedicated error line immediately. Clicking on the dot on the graph will scroll to the corresponding line in the text log, so that the specific error and what was happening around it can be seen.
The logged data can also be enhanced by utilizing the individual PacketPortal ProbeID's. Since the software can manage 1000 probes 22 simultaneously, filtering the log viewer to show messages logged for a single probe 22 or subset of probes 22, i.e. by geographic location or by type of host device, e.g. switch, can be very helpful in analyzing what is happening at specific locations in the network or with specific devices therein. In a more general use, other applications may want some other key descriptor that could be associated with each log event record and could be used for sorting or filtering.
The log viewer has been enhanced to use the ProbeID for filtering, which makes interpreting the volumes of logged data much easier. In the log viewer, the log text is searchable by searching through the text, e.g. following events of the same Probe ID, same Thread Id, or same Event Type. The elapsed time between events can also be determined and displayed.
The log viewer has a number of processing threads that each have a different task to perform. For example, there is a single thread whose sole purpose is to read incoming packets from a network socket. Another threads job is to decrypt packets, etc. The threads do work by communicating with all of the probes 22 currently being managed by the system manager software, which is stored in non-volatile memory and executed on the system manager controller 24.
With reference to
In the graph of
The graph in
Accordingly, the present invention provides a log of execution of many threads across many probes and many thousands of events, that can be looked at. Moreover, are all the events are logged while the application is running in its normal, full-speed mode. Often, text logging slows down real-time applications so significantly that the behavior is changed, which often times makes the logs useless, since some issues only reproduce when the application is running full speed under heavy load.
The visualization of the data in the graph by the log viewer helps quickly hone in on an important region of the log to look at. Looking at a few dozen important lines quickly is key when they are embedded in thousands of lines of often repetitive information.
Following is an example of a mapping file, e.g. for message IDs 10 through 34 and their corresponding human readable character strings.
In other words, a list of mappings of message IDs to text strings exist in a file that the log viewer uses. Two examples if this are the mappings for message IDs 10 and 27:
Message ID: 10 maps to string: “[StatusMoniton:LogInfo] Depth=%d, Current Size=%d, Max Size=%d, Monitored Entity =%s”
Message ID: 27 Maps to string: “* * * ERROR * * * [CCThreadManager::DoEncryptThread] failed to encrypt”
A record logged with messageID=10, would need 3 32bit integers and one string in the variable part of the event record since it has 3%d and 1%s in the string.
A record logged with messageID=27, would have no variable data.
The Timestamp and ProbeID are formatted as well. The Event Type is a string that the log viewer knows from the binary type in the record.
When reconstructed in the log viewer, the two entries with the Message ID's above and the format: <time stamp><Event Type><Probe ID ><−Message . . . >; would look something like:
00000.032 ms General Info (00:00:00:00:00:00) StatusMonitor::LogInfo] Depth=0, Current Size=1, Max Size=1000, Monitored Entity=Receive Queue
00000.033 ms Encryption (01:02:03:04:05:06) * * * ERROR * * * [CCThreadManager::DoEncryptThread] failed to encrypt
The fact that almost none of these strings are logged in the actual binary log file is part of what makes this log file so compact and the logging fast. Each character would take 1 byte if it were all logged. Since we can log thousands or tens of thousands of records per second, this is important.
The present invention claims priority from U.S. Patent Application No. 61/585,491 filed Jan. 11, 2012, which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
61585491 | Jan 2012 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13737206 | Jan 2013 | US |
Child | 15395889 | US |