The present disclosure relates generally to systems and methods for non-volatile packet storage memory units. More particularly, the present disclosure relates to systems and methods for high-speed data packet PCAP capture and storage with error detection-correction.
Data packet capture has become an essential tool for the securing and debugging of networks and network protocols. A computing device may capture packets on a network by configuring its network interface to receive some or all packets traversing the segment of the network to which the network interface is connected. The computing device may store captured packets, and/or display a representation of their contents in real time. As just some examples, intrusion detection systems (IDS s), intrusion prevention systems (IPSs), and packet analyzers rely on accurate data packet capture.
Conventional data packet capture tools, such as Tcpdump and Wireshark, operate on general purpose computing devices (e.g., personal computers operating WINDOWS® or LINUX® operating systems). These tools provide mechanisms for capturing packets for storage or real-time display.
While processor speed, memory size, and network data rates have each grown significantly over the last 20 years, network data rate improvements have outpaced that of processor speed and memory size. As a result, it is challenging to provide reliable, low-loss data packet capture in a high speed network. For example, capturing all data packets on one of today's Ethernet links operating at a speed of 10 gigabits per second, 40 gigabits per second, or 100 gigabits per second is virtually impossible using a software-based implementation on generic computing devices. Captured packets may be dropped in the network interface as they await processing by the kernel (operating system) of a computing device, dropped in the kernel as they await processing by a packet capture application operating on the computing device, or dropped because the packet arrival rate exceeds the rate at which captured packets can be written to a file system (e.g., disk drive) of the computing device.
The embodiments herein involve a packet capture architecture that processes blocks of packets rather than individual packets. These blocks are processed in a pipelined fashion with ample buffering as they are transferred between a customized PHY, MAC, PCAP, PCI, HBA and long-term (non-volatile) packet storage. As a result of this specifically-design architecture, sustained capture rates of 100 gigabits per second may be achieved.
Accordingly, a first example embodiment may involve a plurality of non-volatile packet storage memory units and a non-volatile file system memory unit containing a file system. The first example embodiment may also involve a network interface unit based on field-programmable gate array technology, where the PHY, MAC, PCAP, PCI, HBA unit is configured to arrange sequentially-received packets into blocks, where each blocks contains a plurality of packets, and where the network interface unit is further configured to extract P4 Table matches and generate Marker(n) for each blocks without modifying the PCAP file.
These as well as other embodiments, aspects, advantages, and alternatives will become apparent to those of ordinary skill in the art by reading the following detailed description, with reference where appropriate to the accompanying drawings. Further, this summary and other descriptions and figures provided herein are intended to illustrate embodiments by way of example only and, as such, that numerous variations are possible. For instance, structural elements and process steps can be rearranged, combined, distributed, eliminated, or otherwise changed, while remaining within the scope of the embodiments as claimed.
References will be made to embodiments of the invention, examples of which may be illustrated in the accompanying figures. These figures are intended to be illustrative, not limiting. Although the invention is generally described in the context of these embodiments, it should be understood that it is not intended to limit the scope of the invention to these particular embodiments.
In the following description, for purposes of explanation, specific details are set forth in order to provide an understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these details. Furthermore, one skilled in the art will recognize that embodiments of the present invention, described below, may be implemented in a variety of ways, such as a process, an apparatus, a system, a device, or a method on a tangible computer-readable medium.
Components, or modules, shown in diagrams are illustrative of exemplary embodiments of the invention and are meant to avoid obscuring the invention. It shall also be understood that throughout this discussion that components may be described as separate functional units, which may comprise sub-units, but those skilled in the art will recognize that various components, or portions thereof, may be divided into separate components or may be integrated together, including integrated within a single system or component. It should be noted that functions or operations discussed herein may be implemented as components. Components may be implemented in software, hardware, or a combination thereof.
Furthermore, connections between components or systems within the figures are not intended to be limited to direct connections. Rather, data between these components may be modified, re-formatted, or otherwise changed by intermediary components. Also, additional or fewer connections may be used. It shall also be noted that the terms “coupled,” “connected,” or “communicatively coupled” shall be understood to include direct connections, indirect connections through one or more intermediary devices, and wireless connections.
Example methods, devices, and systems are described herein. It should be understood that the words “example” and “exemplary” are used herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as being an “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or features unless stated as such. Thus, other embodiments can be utilized and other changes can be made without departing from the scope of the subject matter presented herein.
Accordingly, the example embodiments described herein are not meant to be limiting. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations. For example, the separation of features into “hardware” and “software” components may occur in a number of ways.
Further, unless context suggests otherwise, the features illustrated in each of the figures may be used in combination with one another. Thus, the figures should be generally viewed as component aspects of one or more overall embodiments, with the understanding that not all illustrated features are necessary for each embodiment.
Additionally, any enumeration of elements, blocks, or steps in this specification or the claims is for purposes of clarity. Thus, such enumeration should not be interpreted to require or imply that these elements, blocks, or steps adhere to a particular arrangement or are carried out in a particular order.
The use of certain terms in various places in the specification is for illustration and should not be construed as limiting. A service, function, or resource is not limited to a single service, function, or resource; usage of these terms may refer to a grouping of related services, functions, or resources, which may be distributed or aggregated.
This section introduces a popular PCAP file format for storing captured packets. As noted above, packet capture on conventional computing devices is limited due to these devices not being optimized for processing a PCAP file that experiences corruption. This section reviews these devices for purposes of comparison, focusing on their bottlenecks.
The embodiments of
PHY(101): PHYsical media device—input=analog signal, output=byte
MAC(102): Media Access Controller—input=bytes, output=packets(pkt)
Each of transceiver module PHY(101) may also be coupled to MAC(102) that performs Ethernet Medium Access Control (MAC). (Forward Error Correction (FEC), and Physical Coding Sublayer (PCS) functions not shown). Regardless, the flow of packets (and processing thereof) is generally from left to right.
IEEE 802.3 defines the Packet as beginning and end bits of an incoming Ethernet packet by detecting Ethernet preamble and epilogue delimiter bits. This sequence may be represented in hexadecimal as 0xFB 0x55 0x55 0x55 0x55 0x55 0x55 0xD5 (least-significant bit first ordering is used). The bit received immediately after this sequence may be the first of the Ethernet packet. Packet may also record a nanosecond timestamp of when the first byte of each packet was received from a high accuracy clock source. This timestamp may be adjusted for propagation delay by a fixed offset.
PCAP(103): industry standard file format for packet capture—input=packets, output=PCAP file
Header at the beginning of each packet. The contents of the header is the same as that of the PCAP per-packet header. A timestamp field that may contain the time (in nanoseconds) of when the packet was captured, a packet capture size field that may indicate the number of bytes of the packet that were actually captured, a packet wire size field that may indicate the actual size of the packet prior to capture. Other fields are possible, and more or less fields may be present. The packet capture size may be less than the packet wire size when P4 Tables is configured to reduce the size of captured packets.
PCI(104): Industry standard to communicate to storage Host Bus Adapters—input=PCAP files, output=Logical Block Address (LBA). PCI(104) may be implemented as a module.
The PCI(104) encapsulates PCAP files into LBA read/write commands to be interpreted by the module HBA(105). Additional formatting may be added to ensure that Marker(108) can readily identify the location of a particular PCAP packet withing a particular LBA. Marker(108) may be implemented as a module.
HBA(105): Host Bus Adapter—input=LBA and DATA, output=storage drive reads and writes
The module HBA(105) may conform to the ANSI T10/1562D specification for storage systems.
Capture write buffer temporarily stores blocks transferred from PCI. These blocks are then distributed across n units of non-volatile storage (106) (SSDO-SSDn). In order to do so, each blocks is queued for writing to one of these units. These entries are populated to spread consecutive blocks over the available units. While only 1 unit of non-volatile storage (106) (SSDs) are shown in
This arrangement provides for high-speed capture and storage of data packets. Particularly, sustained rates of 100 gigabytes per second can be supported. The end-to-end storage system described herein does so by operating on blocks rather than individual packets, carefully aligning blocks as well as packets within blocks for ease of processing. The process is similar to RAID0, where writing blocks sequentially across an array of SSDs (or other storage units) increases sequential write performance over writing sequentially to the same SSD, and prioritizing blocks writing operations over other operations.
Notably, when writing to a particular SSD, each blocks is written to a sequentially increasing location. This limits SSD stalls due to internal garbage collection and wear-leveling logic.
The Marker described in the next section is beneficial for both tcpdump.org and P4.org implementations. The preferred implementation is P4.org which is a super-set of the tcpdump.org capabilities.
P4 Tables(107): input=P4 Tables and pkt, output=match
P4 is an open source community packet classification standard. Its basic function is broken into two parts Table(match)=>Action. Only the Table(match) function is implemented for Packet classifier. P4 Tables classify each incoming packet based on pre-defined rules. The classification match includes binary (TRUE/FALSE) designations. The rules may include bit-wise logical “and” and “compare” operations on the first 250 bytes of the packet, for example. A total of 16 rules may be supported, and these rules may be software programmable. A packet may match multiple rules. Each rule match is preserved in the Module Marker for post-capture processing by external means.
Marker(108): Marker generator—input=match, LBA, Offset into LBA, output=marker file to storage
Given
P4 Tables(107) may provide a user interface through which one or more packet filter expressions may be entered. The user interface may include a graphical user interface, a command line, or a file.
The packet filter expressions may specify the packets that are to be delivered by P4 Tables. For example, the packet filter expression “host 10.0.0.2 and tcp” may capture all TCP packets to and from the computing device with the IP address 10.0.0.2. As additional examples, the packet filter expression “port 67 or port 68” may capture all Dynamic Host Configuration Protocol (DHCP) traffic, while the packet filter expression “not broadcast and not multicast” may capture only unicast traffic.
Packet filter expressions may include, as shown above, logical conjunctions such as “and”, “or”, and “not.” With these conjunctions, complex packet filters can be defined. Nonetheless, the packet filter expressions shown above are for purpose of example, and different packet filtering syntaxes may be used. For instance, some filters may include a bitstring and an offset, and may match any packet that includes the bitstring at the offset number of bytes into the packet.
Packet capture application may store the received packets in the industry standard PCAP (packet capture) format which consists of a PCAP header, followed by per-packet header: (i) Each instance of per-packet header may precede its associated packet. (ii) Captured packet length may specify the number of bytes of packet data actually captured and saved in file. (iii) Original packet length may specify the number of bytes in the packet as the packet appeared on the network on which it was captured.
The position of the next packet is calculated using the Captured packet length. A corruption of the Captured packet length creates catastrophic error propagation of all subsequent packets. Notably, Markers can be utilized for Error detection and correction and allow re-construction of corrupted PCAP files. In the next sections, PCAP and Marker Formats are disclosed.
FPGA-based network interface
Network management interface may be added onto one or more network interfaces used for connectivity and data transfer. For instance, while FPGA-based network interface
Packet capture application may store the received packets in one of several possible formats. The industry standard format is the PCAP (packet capture) format. There may be one instance of PCAP header followed by N+1 captured packets preceded by a per-packet header.
1) As noted above, PCAP header consists of:
Magic number serving to indicate the byte-ordering of the computing device that performed the capture. For instance, magic number may be defined to always have the hexadecimal value of 0xa1b2c3d4 in the native byte ordering of the capturing device. If magic number has a value of 0xd4c3b2a1, then this device may have to swap the byte-ordering of the fields that follow magic number.
Major version and minor version define the version of the PCAP format used. In most instances, major version is 2 and minor version is 4, which indicates that the version number is 2.4.
Time zone offset may specify the difference, in seconds, between the local time zone of the capturing device and Coordinated Universal Time (UTC). In some cases, the capturing device will set this field to 0 regardless of its local time zone.
Timestamp accuracy may specify the accuracy of any time stamps in file. In practice, this field is often set to 0.
Capture length may specify the maximum packet size, in bytes, that can be captured. In some embodiments, this value is set to 65536, but can be set to be smaller if the user is not interested in large-payload packets, for instance. If a packet larger than what is specified in this field is captured, it may be truncated to conform to the maximum packet size.
Datalink protocol may specify the type of datalink interface on which the capture took place. For instance, this field may have a value of 1 for Ethernet, 105 for Wifi, and so on.
2) As noted above, for PCAP per-packet header:
There may be one instance of per-packet header for each packet represented in file. Each instance of per-packet header may precede its associated packet. Timestamp seconds and timestamp microseconds may represent the time at which the associated packet was captured. As noted above, this may be the local time of the capturing device or UTC time.
Captured packet length may specify the number of bytes of packet data actually captured and saved in file.
Original packet length may specify the number of bytes in the packet as the packet appeared on the network on which it was captured.
Notably, A corruption of the Captured packet length creates catastrophic error propagation of all subsequent packets.
In general, Captured packet length is expected to be less than or equal to Original packet length. For example, if Capture length is 1000 bytes and a packet is 500 bytes, then Captured packet length and Original packet length may both be 500. However, if the packet is 1500 bytes, then Captured packet length may be 1000 while Original packet length may be 1500. While the traditional system described in the context of
Marker (108) may involve receiving, from a network interface, packets. Marker creates a Block starting with (i) marker sequence number=n. (ii) timestamp of the creation time of the marker(n). (iii) LBA and Offset into the LBA of the first byte of first packet within the associated Marker(n). The Marker(n) may contain a plurality of per-packet P4 Table matches that were captured by the network interface. The network interface unit may include one or more Ethernet interfaces, each with a line speed of at least 10 gigabits per second.
In some embodiments, the size of each of the Marker(n) is fixed and identical. Each of the Markers may contain an integer number P4 Table Match(Item)s. The Markers may then be sent to non-volatile Marker storage unit (109) separate from Packet storage unit. Notably, Marker storage is less frequent that Packet storage and may be sent through the Management interface.
Notably, the relationships as noted above supports P4 Table verification at the packet level and that P4.org has a separate specification just for P4 Table verification.
In the next sections, error detection and correction is proposed and implemented.
Detection of PCAP corruption may be achieved when the last packet pointed to in Marker(n) does not end at the first byte of the first packet of Marker(n+1) i.e. LBA(n+1)+Offset(n+1).
Notably, Detection may span many Markers for example Marker(n) to Marker(n+x)
A. Correction may be achieved by discarding the entire contents of data pointed to by Marker(n) and beginning a new PCAP at first packet pointed to by Marker(n+1) i.e. LBA(n+1)+Offset(n+1).
B. Other more aggressive Correction policies are possible and may be implemented using Markers such as Forward Correction and Reverse Correction: 1) forward PCAP error correction process checks each packet starting at Marker(n) location attempting to recover packets forward up to the corrupted packet, 2) reverse PCAP error correction process checks each packet starting at Marker(n+1) location attempting to recover packets backwards down to the corrupted packet in Marker(n).
Notably, finding Marker(n) is aided by the timestamp embedded in the Markers.
The present disclosure is not to be limited in terms of the particular embodiments described in this application, which are intended as illustrations of various aspects. Many modifications and variations can be made without departing from its scope, as will be apparent to those skilled in the art. Functionally equivalent methods and apparatuses within the scope of the disclosure, in addition to those described herein, will be apparent to those skilled in the art from the foregoing descriptions. Such modifications and variations are intended to fall within the scope of the appended claims.
The above detailed description describes various features and operations of the disclosed systems, devices, and methods with reference to the accompanying figures. The example embodiments described herein and in the figures are not meant to be limiting. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein. It will be readily understood that the aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations.
With respect to any or all of the message flow diagrams, scenarios, and flow charts in the figures and as discussed herein, each step, block, and/or communication can represent a processing of information and/or a transmission of information in accordance with example embodiments. Alternative embodiments are included within the scope of these example embodiments. In these alternative embodiments, for example, operations described as steps, blocks, transmissions, communications, requests, responses, and/or messages can be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved. Further, more or fewer blocks and/or operations can be used with any of the ladder diagrams, scenarios, and flow charts discussed herein, and these ladder diagrams, scenarios, and flow charts can be combined with one another, in part or in whole.
A step or block that represents a processing of information can correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively, or additionally, a step or block that represents a processing of information can correspond to a module, a segment, or a portion of program code (including related data). The program code can include one or more instructions executable by a processor for implementing specific logical operations or actions in the method or technique. The program code and/or related data can be stored on any type of computer readable medium such as a storage device including RAM, a disk drive, or another storage medium.
The computer readable medium can also include non-transitory computer readable media such as computer readable media that store data for short periods of time like register memory and processor cache. The computer readable media can further include non-transitory computer readable media that store program code and/or data for longer periods of time. Thus, the computer readable media may include secondary or persistent long term storage, like ROM, optical or magnetic disks, compact-disc read only memory (CD-ROM), for example. The computer readable media can also be any other volatile or non-volatile storage systems. A computer readable medium can be considered a computer readable storage medium, for example, or a tangible storage device.
Moreover, a step or block that represents one or more information transmissions can correspond to information transmissions between software and/or hardware modules in the same physical device. However, other information transmissions can be between software modules and/or hardware modules in different physical devices.
The particular arrangements shown in the figures should not be viewed as limiting. It should be understood that other embodiments can include more or less of each element shown in a given figure. Further, some of the illustrated elements can be combined or omitted. Yet further, an example embodiment can include elements that are not illustrated in the figures.
While various aspects and embodiments have been disclosed herein, other aspects and embodiments will be apparent to those skilled in the art. The various aspects and embodiments disclosed herein are for purpose of illustration and are not intended to be limiting, with the true scope being indicated by the following claims.
This patent application is related to and claims priority benefit under 35 U.S.C. § 119(e) to co-pending and commonly-owned U.S. Provisional Patent Application No. 63/195,584, entitled “HIGH SPEED DATA PACKET PCAP CAPTURE AND STORAGE WITH ERROR DETECTION-CORRECTION,” naming as inventor Stephan Harting, and filed Jun. 1, 2021, which patent document is incorporated by reference herein in its entirety and for all purposes.
Number | Date | Country | |
---|---|---|---|
20230403219 A1 | Dec 2023 | US |
Number | Date | Country | |
---|---|---|---|
63195584 | Jun 2021 | US |