Hardware accelerated application-based pattern matching for real time classification and recording of network traffic

Description

BACKGROUND

This disclosure relates generally to network forensics, and in particular to utilizing a parallel pattern matching processing unit to perform pre-classification of packets in a packet stream for real time classification and recording of network traffic.

The field of network forensics involves, among other things, various different possible methods of discovering and analyzing the contents of packetized data transmitted over a network. Identifying particular forms of data (e.g., a motion pictures experts group (MPEG) file, a voice over Internet protocol (VoIP) session, etc.), as well as the content of a particular form of data (.e.g., the actual audio file encoded pursuant to the MPEG standard, the audio related to the VoIP session, etc.) transmitted over a network can be a time consuming and computationally intensive task. Such identification may be particularly time consuming and computationally intensive given the rate and volume of packets that may be transmitted over a network.

If packets are recorded for subsequent examination or searching (as is practiced in network metric, security and forensic applications), then identifying a particular form of data and extracting the contents of the data may involve first searching an entire database of packets, possibly 10 s, 100 s, or more terabytes of data, to identify any data possibly conforming to the search request. Such a search may not be conducive to practical, real time discovery and analysis of data types and contents of interest.

Packets may be analyzed and indexed in a database as they are being recorded. By forming indices based on packet characteristics, metadata, and locations where the packets are recorded, identifying and reporting on a particular instance of data and extracting the contents of the data may be performed by searching the indices instead of the entire database of packets. This approach may reduce time and computation required to search. However, this approach may also increase the time and computation required to record the packets.

Recording packets of a network with a high rate and volume of such packets can be a time consuming and computationally intensive task. Time and computational resources needed for real time analyzing and indexing while recording packets may not be feasible. Even if such real time analyzing and indexing is feasible, the amount of analysis that is possible may be limited.

SUMMARY

A system includes a shared memory with slots to transiently store network data packets. A packet capture repository utilizes a non-transitory storage medium. An indexing database utilizes a non-transitory storage medium. A pattern matching processing unit generates preclassification data for the network data packets utilizing pattern matching analysis. At least one processing unit implements a storage process that receives the network data packets, stores the network data packets in at least one of the slots, and transfers the network data packets to the packet capture repository when the slots in the shared memory are full. A preclassification process requests from the pattern matching processing unit the preclassification data. An indexing process determines, based upon the preclassification data, whether to invoke or omit additional analysis of the network data packets, and performs at least one of aggregation, classification, or annotation of the network data packets in the shared memory to maintain one or more indices in the indexing database.

BRIEF DESCRIPTION OF THE DRAWINGS

Example embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1 is a diagram illustrating real time classification and recording of packetized network traffic.

FIG. 2 is a diagram illustrating hardware accelerated application-based pattern matching for real time classification and recording of packetized network traffic in accordance with an embodiment of the invention.

FIG. 3 is a system for hardware accelerated application-based pattern matching for real time classification and recording of network traffic in accordance with an embodiment of the invention.

FIG. 4 is a flow chart illustrating a disclosed method for hardware accelerated application-based pattern matching for real time classification and recording of network traffic.

FIG. 5 is a diagram illustrating storage of packet data in a packet capture repository.

FIG. 6 is a diagram illustrating an indexing database that includes indices to packets contained within a packet capture repository.

Other features of the present embodiments will be apparent from the accompanying drawings and from the disclosure that follows.

DETAILED DESCRIPTION

In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the various embodiments. It may be evident, however, to one skilled in the art that the various embodiments may be practiced without these specific details.

FIG. 1 is a diagram illustrating the functional flow 100 of real time capturing, aggregating, classifying, annotating, and storing packetized data transmitted over a network 101. As illustrated, one or more kernel-level processes 106 executing on one or more processing units 102 may receive a stream of packets transmitted on the network 101. A kernel is the main component of most computer operating systems. It is a bridge between applications and the actual data processing done at the hardware level. The kernel's responsibilities include managing the system's resources (the communication between hardware and software components). Usually as a basic component of an operating system, a kernel can provide the lowest-level abstraction layer for the resources (e.g., processors and I/O devices) that application software must control to perform its function. It typically makes these facilities available to application processes through inter-process communication mechanisms and system calls.

As the stream of packets is received, the kernel-level process may utilize copyless direct memory access (DMA) techniques to store the packets of the stream in slots 108 of a memory 103. DMA allows certain hardware subsystems within the computer to access system memory independently of the central processing unit (CPU). Without DMA, when the CPU is using programmed input/output, it is typically fully occupied for the entire duration of the read or write operation, and is thus unavailable to perform other work. With DMA, the CPU initiates the transfer, does other operations while the transfer is in progress, and receives an interrupt from the DMA controller when the operation is done. This feature is useful any time the CPU cannot keep up with the rate of data transfer, or where the CPU needs to perform useful work while waiting for a relatively slow I/O data transfer. Computers that have DMA channels can transfer data to and from devices with much less CPU overhead than computers without a DMA channel. Similarly, a processing element inside a multi-core processor can transfer data to and from its local memory without occupying its processor time, allowing computation and data transfer to proceed in parallel. DMA can also be used for “memory to memory” copying or moving of data within memory. DMA can offload expensive memory operations from the CPU to a dedicated DMA engine.

The memory 103 is shared with one or more user-level processes 107 executed by the processing unit 102. When the slots in the memory are full, the kernel-level process may transfer the packets in the slots in memory to slots in a packet repository in the storage 104, also utilizing copyless DMA techniques.

While the kernel-level process is storing the packets of the stream in slots of the memory, an indexing process 109 of the user-level processes may access the packets in the slots of the memory and aggregate, classify, and annotate the packets based on various characteristics to maintain indices in an indexing database 105. The indices may reference locations in the slots in the packet repository where packets are stored.

However, the indexing process 109 executed by the processing unit 102 may be limited by the amount of analysis that the indexing process has resources to perform on the packets for purposes of aggregating, classifying, and annotating in real time. Regardless of the processing resources available to the indexing process, and though the rate and volume of the stream of packets may vary, the processing unit may continue to receive additional packets of the stream. As such, the indexing process may have to limit the amount of analysis performed on packets that have been received in order to perform analysis on the next received packets in the stream.

By way of contrast to the functional flow 100 of real time capturing, aggregating, classifying, annotating, and storing packetized data transmitted over a network 101 illustrated in FIG. 1, FIG. 2 is a diagram illustrating the functional flow 200 of hardware accelerated application-based pattern matching for real time classification and recording of packetized data transmitted over a network 201. As illustrated, one or more kernel-level processes 206 executing on one or more processing units 202 may receive a stream of packets transmitted on the network 201. As the stream of packets is received, the kernel-level process may utilize copyless direct memory access (DMA) techniques to store the packets of the stream in slots 208 of a memory 203 that is shared with one or more user-level processes 207 executed by the processing unit 202. When the slots in the memory are full, the kernel-level process may transfer the packets in the slots in memory to slots in a packet repository in the storage 204, also utilizing copyless DMA techniques.

While the kernel-level process 206 is storing the packets of the stream in slots 208 of the memory 203, an indexing process 209 of the user-level processes may interact with a preclassification process 210 of the user-level processes to access the packets in the slots of the memory and aggregate, classify, and annotate the packets based on various characteristics to maintain indices in an indexing database 205. The indices may reference locations in the slots in the packet repository where packets are stored.

The preclassification process 210 may include requisite levels of preprocessing on the data for normalization (e.g., perform decompression and encoding transformations to arrive at a consistent representation of the data for pattern matching). The preclassification process 210 may communicate with a pattern matching processing unit 211 to request the pattern matching processing unit perform preclassification on the packets in the slots 208 of the memory 203 by performing pattern matching analysis. In one embodiment, Aho-Corasick string matching is used. Aho-Coasick string matching is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the “dictionary”) within an input text. It matches all patterns simultaneously. The complexity of the algorithm is linear in the length of the patterns plus the length of the searched text plus the number of output matches.

In performing the pattern matching analysis, the pattern matching processing unit may determine one or more characteristics of the packets in the slots of the memory, such as identifying the application to which the packets relate, the protocol utilized to transmit the packets, file types of payload data content, source and/or destination addresses associated with the packets, packet lengths, and so on. The pattern matching processing unit may determine the characteristics by comparing bit patterns of the packets with a library of bit patterns associated with the characteristics.

The pattern matching processing unit 211 may be one or more types of processing units capable of performing pattern matching analysis on multiple packets in parallel. For example, the pattern matching processing unit may be a graphical processing unit that includes thousands of separate cores which may each perform pattern matching analysis on a different packet. As such, the pattern matching processing unit may simultaneously (or substantially simultaneously) perform pattern matching analysis on the packets of one or more slots 208 in the memory 203.

The pattern matching processing unit 211 may return the results of the pattern matching analysis to the preclassification process 210. The preclassification process may then interact with the indexing process 209 to aggregate, classify, and annotate the packets based on the results of the pattern matching analysis, as well as any additional analysis of the packets performed by the indexing process to maintain the indices in the indexing database 205. As result of offloading pattern matching analysis to a pattern matching processing unit which can analyze multiple packets in parallel, the amount of real time analysis that can be performed on received packets may be greater than in the functional flow 100 illustrated in FIG. 1. This may further reduce time and computation required to search for stored packets over that possible with the functional flow 100 illustrated in FIG. 1.

FIG. 3 illustrates a system 300 for hardware accelerated application-based pattern matching for real time classification and recording of network traffic. The system may utilize various techniques and/or apparatuses described in published PCT application PCT/US2005/045566 entitled “Method and Apparatus for Network Packet Capture Distributed Storage System”, published U.S. application Ser. No. 12/126,656 entitled “Method and Apparatus to Index Network Traffic Meta-Data” (US 2009/0290492), published U.S. application Ser. No. 12/126,551 entitled “Method and Apparatus of Network Artifact Identification and Extraction” (US 2009/0290580), published U.S. application Ser. No. 12/471,433 entitled “Presentation of An Extracted Artifact Based on an Indexing Technique” (US 2009/0292681), and published U.S. application Ser. No. 12/126,619 titled “On Demand Network Activity Reporting Through a Dynamic File System and Method” (US 2009/0292736), all of which are herein incorporated by reference in their entirety.

The system 300 includes a capture appliance 302 that is communicably coupled to a network 301, storage 303, and an indexing database 304. Data packetized according to a variety of different protocols (such as hypertext transfer protocol, file transfer protocol, Internet Protocol version 4, Internet Protocol version 6, transmission control protocol, user datagram protocol, server message block, simple mail transfer protocol, and so on) may be transmitted over the network. The capture appliance may be operable to capture, aggregate, annotate, store, and index network packet data in real time from one or more portions of the network and retrieve such data utilizing the storage and the indexing database. Thus, the storage may be operable as a packet capture repository and the indexing database may be operable as an index into the packet capture repository. The storage may include any kind of storage media (such as one or more magnetic storage media, optical storage media, volatile memory, non-volatile memory, flash memory, and so on) configured as a Redundant Array of Independent Discs (RAID) implementation, a storage area network, and so on.

The capture appliance 302 may include at least one processing unit 305, one or more computer readable media 306 (such as random access memories, hard disks, flash memory, cache memory, non-volatile memory, optical storage media, and so on), one or more pattern matching processing units 308, and one or more output components 307 that are operable to communicate with the storage 303 and/or the indexing database 304. The capture appliance may also include one or more user interface components 309 which may be operable to interact with one or more input/output devices 310. The capture appliance may be a dedicated device in some implementations. In other implementations, the capture appliance may be software operating in a virtualized environment implemented by the processing unit executing one or more instructions stored in the computer readable media.

The processing unit 305 may execute instructions stored in the computer readable medium 306 to implement one or more packet storage processes that receive one or more streams of packets transmitted on the network 301 and store the packets in slots of a shared memory (which may be the computer readable medium) utilizing DMA techniques, such as copyless DMA techniques. The slots may be of a fixed size, such as 64 megabytes. When the slots in the shared memory are full, the packet storage process may transfer the packets in the slots in shared memory to slots in the packet repository in the storage 303, also utilizing DMA techniques, such as copyless DMA techniques.

The processing unit 305 may also execute instructions stored in the computer readable medium 306 to implement one or more indexing processes and one or more preclassification processes. As the packet storage process stores the packets in slots of the shared memory, the indexing process may interact with the preclassification process to access the packets in the slots of the shared memory and aggregate, classify, and/or annotate the packets based on various characteristics in order to maintain indices in the indexing database 304. The indices may reference locations in the slots in the packet repository where packets are stored. The preclassification process may communicate with the pattern matching processing unit 308 to request the pattern matching processing unit perform preclassification on the packets in the slots of the shared memory. The pattern matching processing unit may return the results of the preclassification to the preclassification process and the preclassification process may then interact with the indexing process to aggregate, classify, and/or annotate the packets based on the results. Additionally, the indexing process may perform additional analysis of the packets to maintain the indices in the indexing database.

The pattern matching processing unit 308 may be one or more processing units capable of performing preclassification on multiple packets in parallel. For example, the pattern matching processing unit may be a graphical processing unit or other kind of parallel processing unit that includes multiple separate cores (such as hundreds, thousands, and so on) which may each perform preclassification on a different packet. Thus, the pattern matching processing unit may simultaneously or nearly simultaneously perform preclassification on the packets of one or more slots in the shared memory.

The pattern matching processing unit 308 may perform preclassification on the packets by utilizing pattern matching analysis (such as Aho-Corasick string matching and so on) to compare bit patterns of the packets with a library of bit patterns that are associated with various characteristics of packets. Such packet characteristics may include the application to which the packets relate, the protocol utilized to transmit the packets, file types of payload data content, source and/or destination addresses associated with the packets, packet lengths, and so on. For example, the pattern matching processing unit may utilize pattern matching analysis to determine the software application to which the packets relate, such as a world wide web (WWVV) application, an instant message application, Facebook™, a computer virus, a Flash™ video application, a peer-to-peer file sharing application, and so on. In such an example, entries in the library of bit patterns may be created by analyzing packets that are known to relate to particular software applications and identifying bit patterns common to the particular software applications. Based on the software application identified for one or more packets by the pattern matching analysis, the indexing process may perform additional analysis of the packets, skip additional analysis of the packets, and so on.

In this example, when packets are identified as relating to a peer-to-peer file sharing application (such as Kazaa™), the indexing process may perform additional analysis to determine the names and types of files being shared. Further in this example, when packets are identified as relating to a Flash™ video application, the indexing process may skip additional analysis. The indexing process may be so configured under the assumption that packets relating to peer-to-peer file sharing applications merit additional analysis because such applications may be utilized to exchange content in violation of copyright whereas Flash™ video applications are less likely to be utilized for such purposes. In this way, the indexing process may utilize the preclassification to guide how available resources of the indexing process are spent. Hence, the indexing process may be able to dedicate resources to further analyzing packets relating to software applications that are more concerning (such as viruses) without wasting resources on further analyzing packets relating to software applications that are less concerning (such as word processing applications).

Other embodiments of the invention identify and index on protocol-specific attributes. For example, after identifying a particular flow as an HTTP protocol flow, there may be further identifying and indexing on such attributes as User-Agent, HTTP Referer, Cookie, Host, x-forwarded-for, etc. Further, identifying and indexing may be based on elements of common web-based applications, such as identifying a particular HTTP session as being an instance of a LinkedIn session, and then identifying and indexing on the username within that LinkedIn (or similar social networking site) web session. Similar processing may be performed with YouTube®, including providing a description of a video being posted.

Although the present example discusses utilizing preclassification to determine the software application to which the packets relate, it should be understood that preclassification may determine other characteristics of packets that may be utilized to identify packets to perform additional analysis upon, skip performing additional analysis upon, and so on. For example, preclassification may determine one or more of protocols utilized to transmit the packets, file types of payload data content, source and/or destination addresses associated with the packets, packet lengths, and so on.

By way of another example, the pattern matching processing unit 308 may utilize pattern matching analysis to determine whether or not the packets related to an already identified traffic flow (i.e., the entire network conversation to which a packet relates). Based on the recognition that one or more packets relate to an already identified traffic flow, the indexing may skip additional analysis of the packets (as the flow has already been analyzed) and the packets may be indexed based on the analysis of the flow that has already been performed. Hence, the indexing process is able to leverage the preclassification to dedicate resources to further analyzing packets relating to flows that have not yet been analyzed without wasting resources on further analyzing packets that relate to already analyzed flows.

In one or more implementations, the processing unit 305 of the capture appliance 302 may also be operable to execute instructions stored in the computer readable medium 306 to query the indexing database 105 in response to input received from the input/output device(s) 310 via the user interface component 309. As part of such a query, the processing unit may be operable to retrieve one or more packets stored in one or more slots of the packet capture repository referenced by one or more indices of the indexing database. The processing unit may then provide and/or display the retrieved packets and/or other information regarding the query, the retrieved packets, and so on to the input/output device(s) via the user interface component.

In FIG. 3, the network 301 is illustrated as a single network. However, in various implementations the network may be composed of multiple local area networks, metropolitan area networks, wide area networks (such as the Internet), virtual private networks, and so on of various kinds (wired, wireless, Ethernet, gigabit Ethernet, twisted pair, fiber optic, coaxial, cellular, and so on) which are connected via various kinds of switches, hubs, gateways, and so forth.

FIG. 4 illustrates a method 400 for hardware accelerated application-based pattern matching for real time classification and recording of network traffic. The method may be performed by the processing unit 305 of FIG. 3. The method begins at block 401 and then the flow proceeds to block 402. At block 402, the processing unit receives a packet stream of network traffic from the network 301. The flow then proceeds to block 403, where a packet storage process executed by the processing unit stores the received packets of the stream in slots in a shared memory. The flow then splits and proceeds (either simultaneously or independently) to both block 404 and block 406.

At block 404, after the packet storage process stores the received packets in slots of the shared memory, the packet storage process determines whether the slots of the shared memory are full. If so, the flow proceeds to block 405. If not, the flow returns to block 402 where the processing unit 305 continues to receive the packet stream of network traffic from the network 301. At block 405, after the packet storage process determines the slots of the shared memory are full, the packet storage process transfers packets stored in the slots of the shared memory to slots of the packet capture repository in the storage 303. Transfer of packets stored in the slots of the shared memory may be performed in a first in first out order when the slots of the shared memory are full.

At block 406, after the packet storage process stores the received packets in slots of the shared memory, a preclassification process executed by the processing unit 305 beings performing preclassification of the packets stored in the slots of the shared memory. The flow proceeds to block 407 where the preclassification process requests the pattern matching processing unit perform pattern matching of the packets stored in the slots of the shared memory. The flow next proceeds to block 408 where the preclassification process receives the pattern matching results from the pattern matching processing unit before the flow proceeds to block 408.

At block 409, an indexing process executed by the processing unit 305 determines whether or not the pattern matching results specify to skip classification of one or more of the packets stored in the slots of the shared memory. If so, the flow proceeds to block 410 where the indexing process skips classification of the one or more packets specified to skip before the flow proceeds to block 411. Otherwise, the flow proceeds to block 411.

At block 411, the indexing process determines whether any of the packets stored in the slots of the shared memory remain to be classified. If so, the flow proceeds to block 412 for classification. Otherwise, the flow returns to block 402 where the processing unit 305 continues to receive the packet stream of network traffic from the network 301.

At block 412, after the indexing process determines that packets stored in the slots of the shared memory remain to be classified, the indexing process groups packet data in the indexing database 304 according to classification of the packets. The classification may include deep packet inspection, header evaluation, and so on. The flow then proceeds to block 413 where the indexing process indexes the indexing database to point to locations of packet data in the packet capture repository in the storage 303. The flow then returns to block 402 where the processing unit 305 continues to receive the packet stream of network traffic from the network 301.

FIG. 5 illustrates storing of packet data in a packet capture repository 502. The illustrated storing of packet data may be performed by the capture appliance 302 of FIG. 3. Referring again to FIG. 5, packetized data may be identified in a flow of packets 501 crossing a network and the identified packet data may be stored in the packet capture repository. In some implementations, all packets flowing through a particular point in a network, such as at the location of a network tap, may be stored in the packet capture repository. Practically speaking, some packets may be lost or dropped due to various issues including delivery failure or practical limits of computing technology, but the system attempts to capture every packet.

The packets 501 may include a data unit (e.g., packets of data of an email, an instant message communication, an audio file, a compressed file, etc.) that may be carried by a flow of the packets in the network. The packet capture repository may contain a collection of packets whose contents might fall into a variety of classes such as software applications to which the packet data relates. By way of example, FIG. 5 illustrates that the packet capture repository contains collections of packets whose contents are related to a World Wide Web (WWW) application 503 (such as a web browser) and an Instant Messaging (IM) application 504 (such as the Facebook™ instant messenger client).

FIG. 6 is a diagram illustrating an indexing database 601 that includes indices to packets contained within a packet capture repository. The illustrated indexing database 601 may be the indexing database 304 of FIG. 3. Referring again to FIG. 6, the indexing database 601 may be a collection of meta-data that is stored in an organized manner so that the data packets may be accessed efficiently through a query.

The information (e.g., packet data, meta-data, etc.) may be extracted from the indexing database 601 through a suitable database query. The database query may be performed through any number of interfaces including a graphical user interface, a web services request, a programmatic request, a structured query language (SQL), and so on, which is used to extract related information of a packet data or any meta-data stored in the indexing database. If queried packet data/information is matched with the data stored in the indexing database, then packets matching the query may be retrieved from an associated packet repository for reconstruction.

The matched packet data may be reconstructed by referring to a memory location corresponding to designated packet data. The indexing database may point to members of a collection of data packets according to “class,” where class may include any data such as software applications to which the packets relate, attributes of a packet header, the presence of a multi media file flowing across the network, a session of a particular user of the network at a particular point in time, and so on. The pointers may point to the memory location of packets stored in the packet capture repository for the purpose of efficient retrieval of relevant packets. The indexing database may point to packets according to their having been classified as containing applications, files, and other data shared through the network in the native packetized format in which it was transmitted. Also, the sessions of each individual user in the network may be stored in the indexing database. Sessions may be grouped and stored in the database.

For example, the indexing database 601 may include indexed WWW data 602, indexed TCP session data 603, indexed data for a particular user's TCP session 604, indexed IM data 605, and so on. Each index 602, 603, 604, and 605 may be a unit of the indexing database. In addition, the indexing database may include pointers pointing to a packet capture repository location of particular information corresponding to an index.

For example, a first pointer 606 may point to a first packet capture repository location 610 within the packet capture repository to represent the contents stored in a particular location of the indexed WWW data 602. A second pointer 607 may point to a second packet capture repository location 611 within the packet capture repository to represent the contents stored in a particular location of the indexed TCP session data 603. A third pointer 609 may point to a third packet capture repository location 612 within the packet capture repository to represent the contents stored in a particular location of the indexed IM data 605. A fourth pointer 608 may point to a fourth packet capture repository location 613 within the packet capture repository to represent the contents stored in a particular location of the indexed data for particular user's TCP session 604, and so on.

In the present disclosure, the methods disclosed may be implemented as sets of instructions or software readable by a device. Further, it is understood that the specific order or hierarchy of steps in the methods disclosed are examples of sample approaches. In other embodiments, the specific order or hierarchy of steps in the method can be rearranged while remaining within the disclosed subject matter. The accompanying method claims present elements of the various steps in a sample order, and are not necessarily meant to be limited to the specific order or hierarchy presented.

The described disclosure may be provided as a computer program product, or software, that may include a non-transitory machine-readable medium having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure. A non-transitory machine-readable medium includes any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The non-transitory machine-readable medium may take the form of, but is not limited to, a: magnetic storage medium (e.g., floppy diskette); optical storage medium (e.g., CD-ROM); magneto-optical storage medium; read only memory (ROM); random access memory (RAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; and so on.

It is believed that the present disclosure and many of its attendant advantages will be understood by the foregoing description, and it will be apparent that various changes may be made in the form, construction and arrangement of the components without departing from the disclosed subject matter or without sacrificing all of its material advantages. The form described is merely explanatory, and it is the intention of the following claims to encompass and include such changes.

While the present disclosure has been described with reference to various implementations, it will be understood that these implementations are illustrative and that the scope of the disclosure is not limited to them. Many variations, modifications, additions, and improvements are possible. More generally, implementations in accordance with the present disclosure have been described in the context of particular embodiments. Functionality may be separated or combined in blocks differently in various embodiments of the disclosure or described with different terminology. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure as defined in the claims that follow.

Claims

1. A system, comprising: a shared memory that includes a plurality of slots to transiently store network data packets;a packet capture repository utilizing a non-transitory storage medium;an indexing database utilizing a non-transitory storage medium;a pattern matching processing unit to generate preclassification data for the network data packets utilizing pattern matching analysis, wherein the pattern matching processing unit includes a graphical processing unit with multiple cores to analyze multiple network data packets in parallel; andat least one processing unit that implements: a storage process that receives the network data packets, stores the network data packets in at least one of the slots, and transfers the network data packets to the packet capture repository when the slots in the shared memory are full;a preclassification process that request from the pattern matching processing unit the preclassification data; andan indexing process to: determine, based upon the preclassification data, whether to invoke or omit additional analysis of the network data packets, such that the indexing process resources are dedicated to further analyzing network data packets of greater concern, andperform at least one of aggregation, classification, or annotation of the network data packets in the shared memory to maintain one or more indices in the indexing database.
2. The system of claim 1 wherein the preclassification process normalizes data within the network data packets.
3. The system of claim 1 wherein the pattern matching analysis is Aho-Corasick string matching.
4. The system of claim 1 wherein the pattern matching analysis is selected from identifying the application to which the network data packets relate, identifying the protocol utilized to transmit the network data packets, identifying file types of payload data in the network data packets, identifying source or destination addresses associated with the network data packets and identifying the lengths of network data packets.
5. The system of claim 1 wherein copyless direct memory access data transfers are used between the plurality of slots.
6. The system of claim 1 wherein the additional analysis determines the names and types of files being shared.
7. The system of claim 1 wherein the additional analysis includes identifying and indexing protocol-specific attributes.
8. The system of claim 1 wherein the additional analysis includes identifying and indexing web-based applications.
9. The system of claim 8 wherein identifying and indexing web-based applications includes identifying and indexing a particular HTTP session.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent Application No. 61/453,456, filed Mar. 16, 2011, entitled, “Hardware Accelerated Application-Based Pattern Matching for Real Time Classification and Recording of Network Traffic”, the contents of which are incorporated herein by reference.

US Referenced Citations (243)

Number	Name	Date	Kind
5274643	Fisk	Dec 1993	A
5440719	Hanes et al.	Aug 1995	A
5526283	Hershey et al.	Jun 1996	A
5602830	Fichou et al.	Feb 1997	A
5758178	Lesartre	May 1998	A
6041053	Douceur et al.	Mar 2000	A
6101543	Alden et al.	Aug 2000	A
6145108	Ketseoglou	Nov 2000	A
6185568	Douceur et al.	Feb 2001	B1
6336117	Massarani	Jan 2002	B1
6370622	Chiou et al.	Apr 2002	B1
6400681	Bertin et al.	Jun 2002	B1
6453345	Trcka et al.	Sep 2002	B2
6516380	Kenchammana-Hoskote et al.	Feb 2003	B2
6522629	Anderson, Sr.	Feb 2003	B1
6591299	Riddle et al.	Jul 2003	B2
6628617	Karol et al.	Sep 2003	B1
6628652	Chrin et al.	Sep 2003	B1
6675218	Mahler et al.	Jan 2004	B1
6693909	Mo et al.	Feb 2004	B1
6708292	Mangasarian	Mar 2004	B1
6754202	Sun et al.	Jun 2004	B1
6782444	Vishlitzky et al.	Aug 2004	B1
6789125	Aviani et al.	Sep 2004	B1
6907468	Moberg et al.	Jun 2005	B1
6907520	Parady	Jun 2005	B2
6928471	Pabari et al.	Aug 2005	B2
6956820	Zhu et al.	Oct 2005	B2
6958998	Shorey	Oct 2005	B2
6993037	Boden et al.	Jan 2006	B2
6999454	Crump	Feb 2006	B1
7002926	Eneboe et al.	Feb 2006	B1
7024609	Wolfgang et al.	Apr 2006	B2
7028335	Borella et al.	Apr 2006	B1
7032242	Grabelsky et al.	Apr 2006	B1
7039018	Singh et al.	May 2006	B2
7047297	Huntington et al.	May 2006	B2
7058015	Wetherall et al.	Jun 2006	B1
7061874	Merugu et al.	Jun 2006	B2
7065482	Shorey et al.	Jun 2006	B2
7072296	Turner et al.	Jul 2006	B2
7075927	Mo et al.	Jul 2006	B2
7116643	Huang et al.	Oct 2006	B2
7126944	Rangarajan et al.	Oct 2006	B2
7126954	Fang et al.	Oct 2006	B2
7142507	Kurimoto et al.	Nov 2006	B1
7145906	Fenner	Dec 2006	B2
7151751	Tagami et al.	Dec 2006	B2
7154896	Kim et al.	Dec 2006	B1
7162649	Ide et al.	Jan 2007	B1
7168078	Bar et al.	Jan 2007	B2
7200122	Goringe et al.	Apr 2007	B2
7203173	Bonney et al.	Apr 2007	B2
7218632	Bechtolsheim et al.	May 2007	B1
7237264	Graham et al.	Jun 2007	B1
7240166	Ganfield	Jul 2007	B2
7254562	Hsu et al.	Aug 2007	B2
7269171	Poon et al.	Sep 2007	B2
7274691	Rogers et al.	Sep 2007	B2
7277399	Hughes, Jr.	Oct 2007	B1
7283478	Barsheshet et al.	Oct 2007	B2
7292591	Parker et al.	Nov 2007	B2
7330888	Storry et al.	Feb 2008	B2
7340776	Zobel et al.	Mar 2008	B2
7359930	Jackson et al.	Apr 2008	B2
7376731	Khan et al.	May 2008	B2
7376969	Njemanze et al.	May 2008	B1
7379426	Sekiguchi	May 2008	B2
7385924	Riddle	Jun 2008	B1
7386473	Blumenau	Jun 2008	B2
7391769	Rajkumar et al.	Jun 2008	B2
7406516	Davis et al.	Jul 2008	B2
7408938	Chou et al.	Aug 2008	B1
7418006	Damphier et al.	Aug 2008	B2
7420992	Fang et al.	Sep 2008	B1
7423979	Martin	Sep 2008	B2
7433326	Desai et al.	Oct 2008	B2
7440464	Kovacs	Oct 2008	B2
7441267	Elliott	Oct 2008	B1
7444679	Tarquini et al.	Oct 2008	B2
7450560	Grabelsky et al.	Nov 2008	B1
7450937	Claudatos et al.	Nov 2008	B1
7453804	Feroz et al.	Nov 2008	B1
7457277	Sharma et al.	Nov 2008	B1
7457296	Kounavis et al.	Nov 2008	B2
7457870	Lownsbrough et al.	Nov 2008	B1
7466694	Xu et al.	Dec 2008	B2
7467202	Savchuk	Dec 2008	B2
7480238	Funk et al.	Jan 2009	B2
7480255	Bettink	Jan 2009	B2
7483424	Jain et al.	Jan 2009	B2
7489635	Evans et al.	Feb 2009	B2
7493654	Bantz et al.	Feb 2009	B2
7496036	Olshefski	Feb 2009	B2
7496097	Rao et al.	Feb 2009	B2
7499590	Seeber	Mar 2009	B2
7508764	Back et al.	Mar 2009	B2
7512078	Swain et al.	Mar 2009	B2
7512081	Ayyagari et al.	Mar 2009	B2
7522521	Bettink et al.	Apr 2009	B2
7522594	Piche et al.	Apr 2009	B2
7522599	Aggarwal et al.	Apr 2009	B1
7522604	Hussain et al.	Apr 2009	B2
7522605	Spencer et al.	Apr 2009	B2
7522613	Rotsten et al.	Apr 2009	B2
7525910	Wen	Apr 2009	B2
7525963	Su et al.	Apr 2009	B2
7526795	Rollins	Apr 2009	B2
7529196	Basu et al.	May 2009	B2
7529242	Lyle	May 2009	B1
7529276	Ramakrishnan	May 2009	B1
7529932	Haustein et al.	May 2009	B1
7529939	Bruwer	May 2009	B2
7532623	Rosenzweig et al.	May 2009	B2
7532624	Ikegami et al.	May 2009	B2
7532633	Rijsman	May 2009	B2
7532726	Fukuoka et al.	May 2009	B2
7533256	Walter et al.	May 2009	B2
7533267	Yoshimura	May 2009	B2
7548562	Ward et al.	Jun 2009	B2
7561569	Thiede	Jul 2009	B2
7617314	Bansod et al.	Nov 2009	B1
7684347	Merkey et al.	Mar 2010	B2
7694022	Garms et al.	Apr 2010	B2
7730011	Deninger et al.	Jun 2010	B1
7792818	Fain et al.	Sep 2010	B2
7853564	Mierau et al.	Dec 2010	B2
7855974	Merkey et al.	Dec 2010	B2
7881291	Grah	Feb 2011	B2
7904726	Elgezabal	Mar 2011	B2
8068431	Varadarajan et al.	Nov 2011	B2
20010039579	Trcka et al.	Nov 2001	A1
20020085507	Ku et al.	Jul 2002	A1
20020089937	Venkatachary et al.	Jul 2002	A1
20020091915	Parady	Jul 2002	A1
20020138654	Liu et al.	Sep 2002	A1
20020163913	Oh	Nov 2002	A1
20020173857	Pabari et al.	Nov 2002	A1
20020191549	McKinley et al.	Dec 2002	A1
20030009718	Wolfgang et al.	Jan 2003	A1
20030014517	Lindsay et al.	Jan 2003	A1
20030028662	Rowley et al.	Feb 2003	A1
20030088788	Yang	May 2003	A1
20030135525	Huntington	Jul 2003	A1
20030135612	Huntington et al.	Jul 2003	A1
20030188106	Cohen	Oct 2003	A1
20030214913	Kan et al.	Nov 2003	A1
20030221003	Storry et al.	Nov 2003	A1
20030231632	Haeberlen	Dec 2003	A1
20030233455	Leber et al.	Dec 2003	A1
20040010473	Hsu et al.	Jan 2004	A1
20040078292	Blumenau	Apr 2004	A1
20040100952	Boucher et al.	May 2004	A1
20040103211	Jackson et al.	May 2004	A1
20040218631	Ganfield	Nov 2004	A1
20040260682	Herley et al.	Dec 2004	A1
20050015547	Yokohata et al.	Jan 2005	A1
20050050028	Rose et al.	Mar 2005	A1
20050055399	Savchuk	Mar 2005	A1
20050063320	Klotz et al.	Mar 2005	A1
20050083844	Zhu et al.	Apr 2005	A1
20050108573	Bennett et al.	May 2005	A1
20050117513	Park et al.	Jun 2005	A1
20050132046	de la Iglesia et al.	Jun 2005	A1
20050132079	Iglesia et al.	Jun 2005	A1
20050207412	Kawashima et al.	Sep 2005	A1
20050229255	Gula et al.	Oct 2005	A1
20050249125	Yoon et al.	Nov 2005	A1
20050265248	Gallatin et al.	Dec 2005	A1
20060013222	Rangan et al.	Jan 2006	A1
20060037072	Rao et al.	Feb 2006	A1
20060069821	P et al.	Mar 2006	A1
20060083180	Baba et al.	Apr 2006	A1
20060088040	Kramer et al.	Apr 2006	A1
20060114842	Miyamoto et al.	Jun 2006	A1
20060126665	Ward et al.	Jun 2006	A1
20060146816	Jain	Jul 2006	A1
20060165009	Nguyen et al.	Jul 2006	A1
20060165052	Dini et al.	Jul 2006	A1
20060167894	Wunner	Jul 2006	A1
20060168240	Olshefski	Jul 2006	A1
20060203848	Damphier et al.	Sep 2006	A1
20060221967	Narayan et al.	Oct 2006	A1
20060233118	Funk et al.	Oct 2006	A1
20060235908	Armangau et al.	Oct 2006	A1
20070019640	Thiede	Jan 2007	A1
20070036156	Liu et al.	Feb 2007	A1
20070038665	Kwak et al.	Feb 2007	A1
20070050334	Deninger et al.	Mar 2007	A1
20070050465	Canter et al.	Mar 2007	A1
20070058631	Mortier et al.	Mar 2007	A1
20070124276	Weissman et al.	May 2007	A1
20070139231	Wallia et al.	Jun 2007	A1
20070140235	Aysan et al.	Jun 2007	A1
20070140295	Akaboshi	Jun 2007	A1
20070147263	Liao et al.	Jun 2007	A1
20070153796	Kesavan et al.	Jul 2007	A1
20070157306	Elrod et al.	Jul 2007	A1
20070162609	Pope et al.	Jul 2007	A1
20070162971	Blom et al.	Jul 2007	A1
20070223474	Shankar	Sep 2007	A1
20070248029	Merkey et al.	Oct 2007	A1
20070250817	Boney	Oct 2007	A1
20070271372	Deninger et al.	Nov 2007	A1
20070286175	Xu et al.	Dec 2007	A1
20070291755	Cheng et al.	Dec 2007	A1
20070291757	Dobson et al.	Dec 2007	A1
20070297349	Arkin	Dec 2007	A1
20080013541	Calvignac et al.	Jan 2008	A1
20080037539	Paramaguru	Feb 2008	A1
20080056144	Hutchinson et al.	Mar 2008	A1
20080117903	Uysal	May 2008	A1
20080159146	Claudatos et al.	Jul 2008	A1
20080175167	Satyanarayanan et al.	Jul 2008	A1
20080181245	Basso et al.	Jul 2008	A1
20080240128	Elrod	Oct 2008	A1
20080247313	Nath et al.	Oct 2008	A1
20080279216	Sharif-Ahmadi et al.	Nov 2008	A1
20080294647	Ramaswamy	Nov 2008	A1
20090003363	Benco et al.	Jan 2009	A1
20090006672	Blumrich et al.	Jan 2009	A1
20090028161	Fullarton et al.	Jan 2009	A1
20090028169	Bear et al.	Jan 2009	A1
20090041039	Bear et al.	Feb 2009	A1
20090073895	Morgan et al.	Mar 2009	A1
20090092057	Doctor et al.	Apr 2009	A1
20090097417	Asati et al.	Apr 2009	A1
20090097418	Castillo et al.	Apr 2009	A1
20090103531	Katis et al.	Apr 2009	A1
20090109875	Kaneda et al.	Apr 2009	A1
20090113217	Dolgunov et al.	Apr 2009	A1
20090116403	Callanan et al.	May 2009	A1
20090116470	Berggren	May 2009	A1
20090119501	Petersen	May 2009	A1
20090122801	Chang	May 2009	A1
20090168648	Labovitz et al.	Jul 2009	A1
20090182953	Merkey et al.	Jul 2009	A1
20090187558	McDonald	Jul 2009	A1
20090219829	Merkey et al.	Sep 2009	A1
20090245114	Vijayaraghavan	Oct 2009	A1
20090290580	Wood et al.	Nov 2009	A1
20090292681	Wood et al.	Nov 2009	A1
20100278052	Matityahu et al.	Nov 2010	A1

Foreign Referenced Citations (14)

Number	Date	Country
0838930	Apr 1998	EP
1004185	May 2000	EP
1387527	Feb 2004	EP
1494425	Jan 2005	EP
1715627	Oct 2006	EP
1971087	Sep 2008	EP
2337903	Dec 1999	GB
2002026935	Jan 2002	JP
2002064507	Feb 2002	JP
2002323959	Nov 2002	JP
20060034581	Apr 2006	KR
WO 0223805	Mar 2002	WO
WO 2005109754	Nov 2005	WO
WO 2009038384	Mar 2009	WO

Non-Patent Literature Citations (11)

Entry
Smith, R., et al., “Evaluating GPUs for Network Packet Signature Matching”, Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium Apr. 26-28, 2009.
Berners-Lee et al., “Hypertext Transfer Protocol—HTTP/1.0”, IETF, RFC 1945, Network Working Group, 1996, 60 pgs.
Fielding et al., “Hypertext Transfer Protocol—HTTP/1.1”, Internet Engineering task Force (IETF), Request for Comments (RFC) 2616, 2068, Netowrk Working Group, 1999, 165 pgs.
Hamill et al., “Petaminer: Efficient Navigation to Petascale Data Using Event-Level Metadata”, Proceedings of XII Advanced Computing and Analysis Techniques in Physics Research, Nov. 2008, pp. 1-5.
Huang et al., PlanetFlow: Maintaining Accountability for Network Services, ACM SIGOPS Operating Systems Review 40, 1 (Jan 2006), 6 pgs.
Kim, Kwang Sik (Exr), International Search Report issued to application No. PCT/US09/41061, Nov. 26, 2009, 8 pgs.
Kim, Sae Young (Exr), International Search Report issued to application No. PCT/US09/41060, Dec. 3, 2009, 8 pgs.
Kwon, Oh Seong (Exr), International Search Report issued to application No. PCT/US09/40733, Nov. 24, 2009, 3 pgs.
Lupia, Sergio (Exr), International Search Report issued to application No. PCT/US10/56739, Apr. 1, 2011, 15 pgs.
Lupia, Sergio (Exr), International Search Report issued to application No. PCT/US10/56723, Apr. 20, 2011, 14 pgs.
Stockinger, “Network Traffic Analysis with Query Driven Visualization—SC2005 HPC Analytics Results”, Proceedings of the 2005 ACM/IEEE SC/05 Conference, Nov. 2005, 4 pgs.

Related Publications (1)

	Number	Date	Country
	20120239652 A1	Sep 2012	US

Provisional Applications (1)

	Number	Date	Country
	61453456	Mar 2011	US

Hardware accelerated application-based pattern matching for real time classification and recording of network traffic

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Abstract