Hardware accelerated application-based pattern matching for real time classification and recording of network traffic

Information

  • Patent Grant
  • 8666985
  • Patent Number
    8,666,985
  • Date Filed
    Thursday, March 15, 2012
    12 years ago
  • Date Issued
    Tuesday, March 4, 2014
    10 years ago
Abstract
An indexing database utilizes a non-transitory storage medium. A pattern matching processing unit generates preclassification data for the network data packets utilizing pattern matching analysis. At least one processing unit implements a storage process that receives the network data packets, stores the network data packets in at least one of the slots, and transfers the network data packets to a packet capture repository when slots in a shared memory are full. A preclassification process requests from the pattern matching processing unit the preclassification data. An indexing process determines, based upon the preclassification data, whether to invoke or omit additional analysis of the network data packets, and performs at least one of aggregation, classification, or annotation of the network data packets in the shared memory to maintain one or more indices in the indexing database.
Description
BACKGROUND

This disclosure relates generally to network forensics, and in particular to utilizing a parallel pattern matching processing unit to perform pre-classification of packets in a packet stream for real time classification and recording of network traffic.


The field of network forensics involves, among other things, various different possible methods of discovering and analyzing the contents of packetized data transmitted over a network. Identifying particular forms of data (e.g., a motion pictures experts group (MPEG) file, a voice over Internet protocol (VoIP) session, etc.), as well as the content of a particular form of data (.e.g., the actual audio file encoded pursuant to the MPEG standard, the audio related to the VoIP session, etc.) transmitted over a network can be a time consuming and computationally intensive task. Such identification may be particularly time consuming and computationally intensive given the rate and volume of packets that may be transmitted over a network.


If packets are recorded for subsequent examination or searching (as is practiced in network metric, security and forensic applications), then identifying a particular form of data and extracting the contents of the data may involve first searching an entire database of packets, possibly 10 s, 100 s, or more terabytes of data, to identify any data possibly conforming to the search request. Such a search may not be conducive to practical, real time discovery and analysis of data types and contents of interest.


Packets may be analyzed and indexed in a database as they are being recorded. By forming indices based on packet characteristics, metadata, and locations where the packets are recorded, identifying and reporting on a particular instance of data and extracting the contents of the data may be performed by searching the indices instead of the entire database of packets. This approach may reduce time and computation required to search. However, this approach may also increase the time and computation required to record the packets.


Recording packets of a network with a high rate and volume of such packets can be a time consuming and computationally intensive task. Time and computational resources needed for real time analyzing and indexing while recording packets may not be feasible. Even if such real time analyzing and indexing is feasible, the amount of analysis that is possible may be limited.


SUMMARY

A system includes a shared memory with slots to transiently store network data packets. A packet capture repository utilizes a non-transitory storage medium. An indexing database utilizes a non-transitory storage medium. A pattern matching processing unit generates preclassification data for the network data packets utilizing pattern matching analysis. At least one processing unit implements a storage process that receives the network data packets, stores the network data packets in at least one of the slots, and transfers the network data packets to the packet capture repository when the slots in the shared memory are full. A preclassification process requests from the pattern matching processing unit the preclassification data. An indexing process determines, based upon the preclassification data, whether to invoke or omit additional analysis of the network data packets, and performs at least one of aggregation, classification, or annotation of the network data packets in the shared memory to maintain one or more indices in the indexing database.





BRIEF DESCRIPTION OF THE DRAWINGS

Example embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:



FIG. 1 is a diagram illustrating real time classification and recording of packetized network traffic.



FIG. 2 is a diagram illustrating hardware accelerated application-based pattern matching for real time classification and recording of packetized network traffic in accordance with an embodiment of the invention.



FIG. 3 is a system for hardware accelerated application-based pattern matching for real time classification and recording of network traffic in accordance with an embodiment of the invention.



FIG. 4 is a flow chart illustrating a disclosed method for hardware accelerated application-based pattern matching for real time classification and recording of network traffic.



FIG. 5 is a diagram illustrating storage of packet data in a packet capture repository.



FIG. 6 is a diagram illustrating an indexing database that includes indices to packets contained within a packet capture repository.





Other features of the present embodiments will be apparent from the accompanying drawings and from the disclosure that follows.


DETAILED DESCRIPTION

In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the various embodiments. It may be evident, however, to one skilled in the art that the various embodiments may be practiced without these specific details.



FIG. 1 is a diagram illustrating the functional flow 100 of real time capturing, aggregating, classifying, annotating, and storing packetized data transmitted over a network 101. As illustrated, one or more kernel-level processes 106 executing on one or more processing units 102 may receive a stream of packets transmitted on the network 101. A kernel is the main component of most computer operating systems. It is a bridge between applications and the actual data processing done at the hardware level. The kernel's responsibilities include managing the system's resources (the communication between hardware and software components). Usually as a basic component of an operating system, a kernel can provide the lowest-level abstraction layer for the resources (e.g., processors and I/O devices) that application software must control to perform its function. It typically makes these facilities available to application processes through inter-process communication mechanisms and system calls.


As the stream of packets is received, the kernel-level process may utilize copyless direct memory access (DMA) techniques to store the packets of the stream in slots 108 of a memory 103. DMA allows certain hardware subsystems within the computer to access system memory independently of the central processing unit (CPU). Without DMA, when the CPU is using programmed input/output, it is typically fully occupied for the entire duration of the read or write operation, and is thus unavailable to perform other work. With DMA, the CPU initiates the transfer, does other operations while the transfer is in progress, and receives an interrupt from the DMA controller when the operation is done. This feature is useful any time the CPU cannot keep up with the rate of data transfer, or where the CPU needs to perform useful work while waiting for a relatively slow I/O data transfer. Computers that have DMA channels can transfer data to and from devices with much less CPU overhead than computers without a DMA channel. Similarly, a processing element inside a multi-core processor can transfer data to and from its local memory without occupying its processor time, allowing computation and data transfer to proceed in parallel. DMA can also be used for “memory to memory” copying or moving of data within memory. DMA can offload expensive memory operations from the CPU to a dedicated DMA engine.


The memory 103 is shared with one or more user-level processes 107 executed by the processing unit 102. When the slots in the memory are full, the kernel-level process may transfer the packets in the slots in memory to slots in a packet repository in the storage 104, also utilizing copyless DMA techniques.


While the kernel-level process is storing the packets of the stream in slots of the memory, an indexing process 109 of the user-level processes may access the packets in the slots of the memory and aggregate, classify, and annotate the packets based on various characteristics to maintain indices in an indexing database 105. The indices may reference locations in the slots in the packet repository where packets are stored.


However, the indexing process 109 executed by the processing unit 102 may be limited by the amount of analysis that the indexing process has resources to perform on the packets for purposes of aggregating, classifying, and annotating in real time. Regardless of the processing resources available to the indexing process, and though the rate and volume of the stream of packets may vary, the processing unit may continue to receive additional packets of the stream. As such, the indexing process may have to limit the amount of analysis performed on packets that have been received in order to perform analysis on the next received packets in the stream.


By way of contrast to the functional flow 100 of real time capturing, aggregating, classifying, annotating, and storing packetized data transmitted over a network 101 illustrated in FIG. 1, FIG. 2 is a diagram illustrating the functional flow 200 of hardware accelerated application-based pattern matching for real time classification and recording of packetized data transmitted over a network 201. As illustrated, one or more kernel-level processes 206 executing on one or more processing units 202 may receive a stream of packets transmitted on the network 201. As the stream of packets is received, the kernel-level process may utilize copyless direct memory access (DMA) techniques to store the packets of the stream in slots 208 of a memory 203 that is shared with one or more user-level processes 207 executed by the processing unit 202. When the slots in the memory are full, the kernel-level process may transfer the packets in the slots in memory to slots in a packet repository in the storage 204, also utilizing copyless DMA techniques.


While the kernel-level process 206 is storing the packets of the stream in slots 208 of the memory 203, an indexing process 209 of the user-level processes may interact with a preclassification process 210 of the user-level processes to access the packets in the slots of the memory and aggregate, classify, and annotate the packets based on various characteristics to maintain indices in an indexing database 205. The indices may reference locations in the slots in the packet repository where packets are stored.


The preclassification process 210 may include requisite levels of preprocessing on the data for normalization (e.g., perform decompression and encoding transformations to arrive at a consistent representation of the data for pattern matching). The preclassification process 210 may communicate with a pattern matching processing unit 211 to request the pattern matching processing unit perform preclassification on the packets in the slots 208 of the memory 203 by performing pattern matching analysis. In one embodiment, Aho-Corasick string matching is used. Aho-Coasick string matching is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the “dictionary”) within an input text. It matches all patterns simultaneously. The complexity of the algorithm is linear in the length of the patterns plus the length of the searched text plus the number of output matches.


In performing the pattern matching analysis, the pattern matching processing unit may determine one or more characteristics of the packets in the slots of the memory, such as identifying the application to which the packets relate, the protocol utilized to transmit the packets, file types of payload data content, source and/or destination addresses associated with the packets, packet lengths, and so on. The pattern matching processing unit may determine the characteristics by comparing bit patterns of the packets with a library of bit patterns associated with the characteristics.


The pattern matching processing unit 211 may be one or more types of processing units capable of performing pattern matching analysis on multiple packets in parallel. For example, the pattern matching processing unit may be a graphical processing unit that includes thousands of separate cores which may each perform pattern matching analysis on a different packet. As such, the pattern matching processing unit may simultaneously (or substantially simultaneously) perform pattern matching analysis on the packets of one or more slots 208 in the memory 203.


The pattern matching processing unit 211 may return the results of the pattern matching analysis to the preclassification process 210. The preclassification process may then interact with the indexing process 209 to aggregate, classify, and annotate the packets based on the results of the pattern matching analysis, as well as any additional analysis of the packets performed by the indexing process to maintain the indices in the indexing database 205. As result of offloading pattern matching analysis to a pattern matching processing unit which can analyze multiple packets in parallel, the amount of real time analysis that can be performed on received packets may be greater than in the functional flow 100 illustrated in FIG. 1. This may further reduce time and computation required to search for stored packets over that possible with the functional flow 100 illustrated in FIG. 1.



FIG. 3 illustrates a system 300 for hardware accelerated application-based pattern matching for real time classification and recording of network traffic. The system may utilize various techniques and/or apparatuses described in published PCT application PCT/US2005/045566 entitled “Method and Apparatus for Network Packet Capture Distributed Storage System”, published U.S. application Ser. No. 12/126,656 entitled “Method and Apparatus to Index Network Traffic Meta-Data” (US 2009/0290492), published U.S. application Ser. No. 12/126,551 entitled “Method and Apparatus of Network Artifact Identification and Extraction” (US 2009/0290580), published U.S. application Ser. No. 12/471,433 entitled “Presentation of An Extracted Artifact Based on an Indexing Technique” (US 2009/0292681), and published U.S. application Ser. No. 12/126,619 titled “On Demand Network Activity Reporting Through a Dynamic File System and Method” (US 2009/0292736), all of which are herein incorporated by reference in their entirety.


The system 300 includes a capture appliance 302 that is communicably coupled to a network 301, storage 303, and an indexing database 304. Data packetized according to a variety of different protocols (such as hypertext transfer protocol, file transfer protocol, Internet Protocol version 4, Internet Protocol version 6, transmission control protocol, user datagram protocol, server message block, simple mail transfer protocol, and so on) may be transmitted over the network. The capture appliance may be operable to capture, aggregate, annotate, store, and index network packet data in real time from one or more portions of the network and retrieve such data utilizing the storage and the indexing database. Thus, the storage may be operable as a packet capture repository and the indexing database may be operable as an index into the packet capture repository. The storage may include any kind of storage media (such as one or more magnetic storage media, optical storage media, volatile memory, non-volatile memory, flash memory, and so on) configured as a Redundant Array of Independent Discs (RAID) implementation, a storage area network, and so on.


The capture appliance 302 may include at least one processing unit 305, one or more computer readable media 306 (such as random access memories, hard disks, flash memory, cache memory, non-volatile memory, optical storage media, and so on), one or more pattern matching processing units 308, and one or more output components 307 that are operable to communicate with the storage 303 and/or the indexing database 304. The capture appliance may also include one or more user interface components 309 which may be operable to interact with one or more input/output devices 310. The capture appliance may be a dedicated device in some implementations. In other implementations, the capture appliance may be software operating in a virtualized environment implemented by the processing unit executing one or more instructions stored in the computer readable media.


The processing unit 305 may execute instructions stored in the computer readable medium 306 to implement one or more packet storage processes that receive one or more streams of packets transmitted on the network 301 and store the packets in slots of a shared memory (which may be the computer readable medium) utilizing DMA techniques, such as copyless DMA techniques. The slots may be of a fixed size, such as 64 megabytes. When the slots in the shared memory are full, the packet storage process may transfer the packets in the slots in shared memory to slots in the packet repository in the storage 303, also utilizing DMA techniques, such as copyless DMA techniques.


The processing unit 305 may also execute instructions stored in the computer readable medium 306 to implement one or more indexing processes and one or more preclassification processes. As the packet storage process stores the packets in slots of the shared memory, the indexing process may interact with the preclassification process to access the packets in the slots of the shared memory and aggregate, classify, and/or annotate the packets based on various characteristics in order to maintain indices in the indexing database 304. The indices may reference locations in the slots in the packet repository where packets are stored. The preclassification process may communicate with the pattern matching processing unit 308 to request the pattern matching processing unit perform preclassification on the packets in the slots of the shared memory. The pattern matching processing unit may return the results of the preclassification to the preclassification process and the preclassification process may then interact with the indexing process to aggregate, classify, and/or annotate the packets based on the results. Additionally, the indexing process may perform additional analysis of the packets to maintain the indices in the indexing database.


The pattern matching processing unit 308 may be one or more processing units capable of performing preclassification on multiple packets in parallel. For example, the pattern matching processing unit may be a graphical processing unit or other kind of parallel processing unit that includes multiple separate cores (such as hundreds, thousands, and so on) which may each perform preclassification on a different packet. Thus, the pattern matching processing unit may simultaneously or nearly simultaneously perform preclassification on the packets of one or more slots in the shared memory.


The pattern matching processing unit 308 may perform preclassification on the packets by utilizing pattern matching analysis (such as Aho-Corasick string matching and so on) to compare bit patterns of the packets with a library of bit patterns that are associated with various characteristics of packets. Such packet characteristics may include the application to which the packets relate, the protocol utilized to transmit the packets, file types of payload data content, source and/or destination addresses associated with the packets, packet lengths, and so on. For example, the pattern matching processing unit may utilize pattern matching analysis to determine the software application to which the packets relate, such as a world wide web (WWVV) application, an instant message application, Facebook™, a computer virus, a Flash™ video application, a peer-to-peer file sharing application, and so on. In such an example, entries in the library of bit patterns may be created by analyzing packets that are known to relate to particular software applications and identifying bit patterns common to the particular software applications. Based on the software application identified for one or more packets by the pattern matching analysis, the indexing process may perform additional analysis of the packets, skip additional analysis of the packets, and so on.


In this example, when packets are identified as relating to a peer-to-peer file sharing application (such as Kazaa™), the indexing process may perform additional analysis to determine the names and types of files being shared. Further in this example, when packets are identified as relating to a Flash™ video application, the indexing process may skip additional analysis. The indexing process may be so configured under the assumption that packets relating to peer-to-peer file sharing applications merit additional analysis because such applications may be utilized to exchange content in violation of copyright whereas Flash™ video applications are less likely to be utilized for such purposes. In this way, the indexing process may utilize the preclassification to guide how available resources of the indexing process are spent. Hence, the indexing process may be able to dedicate resources to further analyzing packets relating to software applications that are more concerning (such as viruses) without wasting resources on further analyzing packets relating to software applications that are less concerning (such as word processing applications).


Other embodiments of the invention identify and index on protocol-specific attributes. For example, after identifying a particular flow as an HTTP protocol flow, there may be further identifying and indexing on such attributes as User-Agent, HTTP Referer, Cookie, Host, x-forwarded-for, etc. Further, identifying and indexing may be based on elements of common web-based applications, such as identifying a particular HTTP session as being an instance of a LinkedIn session, and then identifying and indexing on the username within that LinkedIn (or similar social networking site) web session. Similar processing may be performed with YouTube®, including providing a description of a video being posted.


Although the present example discusses utilizing preclassification to determine the software application to which the packets relate, it should be understood that preclassification may determine other characteristics of packets that may be utilized to identify packets to perform additional analysis upon, skip performing additional analysis upon, and so on. For example, preclassification may determine one or more of protocols utilized to transmit the packets, file types of payload data content, source and/or destination addresses associated with the packets, packet lengths, and so on.


By way of another example, the pattern matching processing unit 308 may utilize pattern matching analysis to determine whether or not the packets related to an already identified traffic flow (i.e., the entire network conversation to which a packet relates). Based on the recognition that one or more packets relate to an already identified traffic flow, the indexing may skip additional analysis of the packets (as the flow has already been analyzed) and the packets may be indexed based on the analysis of the flow that has already been performed. Hence, the indexing process is able to leverage the preclassification to dedicate resources to further analyzing packets relating to flows that have not yet been analyzed without wasting resources on further analyzing packets that relate to already analyzed flows.


In one or more implementations, the processing unit 305 of the capture appliance 302 may also be operable to execute instructions stored in the computer readable medium 306 to query the indexing database 105 in response to input received from the input/output device(s) 310 via the user interface component 309. As part of such a query, the processing unit may be operable to retrieve one or more packets stored in one or more slots of the packet capture repository referenced by one or more indices of the indexing database. The processing unit may then provide and/or display the retrieved packets and/or other information regarding the query, the retrieved packets, and so on to the input/output device(s) via the user interface component.


In FIG. 3, the network 301 is illustrated as a single network. However, in various implementations the network may be composed of multiple local area networks, metropolitan area networks, wide area networks (such as the Internet), virtual private networks, and so on of various kinds (wired, wireless, Ethernet, gigabit Ethernet, twisted pair, fiber optic, coaxial, cellular, and so on) which are connected via various kinds of switches, hubs, gateways, and so forth.



FIG. 4 illustrates a method 400 for hardware accelerated application-based pattern matching for real time classification and recording of network traffic. The method may be performed by the processing unit 305 of FIG. 3. The method begins at block 401 and then the flow proceeds to block 402. At block 402, the processing unit receives a packet stream of network traffic from the network 301. The flow then proceeds to block 403, where a packet storage process executed by the processing unit stores the received packets of the stream in slots in a shared memory. The flow then splits and proceeds (either simultaneously or independently) to both block 404 and block 406.


At block 404, after the packet storage process stores the received packets in slots of the shared memory, the packet storage process determines whether the slots of the shared memory are full. If so, the flow proceeds to block 405. If not, the flow returns to block 402 where the processing unit 305 continues to receive the packet stream of network traffic from the network 301. At block 405, after the packet storage process determines the slots of the shared memory are full, the packet storage process transfers packets stored in the slots of the shared memory to slots of the packet capture repository in the storage 303. Transfer of packets stored in the slots of the shared memory may be performed in a first in first out order when the slots of the shared memory are full.


At block 406, after the packet storage process stores the received packets in slots of the shared memory, a preclassification process executed by the processing unit 305 beings performing preclassification of the packets stored in the slots of the shared memory. The flow proceeds to block 407 where the preclassification process requests the pattern matching processing unit perform pattern matching of the packets stored in the slots of the shared memory. The flow next proceeds to block 408 where the preclassification process receives the pattern matching results from the pattern matching processing unit before the flow proceeds to block 408.


At block 409, an indexing process executed by the processing unit 305 determines whether or not the pattern matching results specify to skip classification of one or more of the packets stored in the slots of the shared memory. If so, the flow proceeds to block 410 where the indexing process skips classification of the one or more packets specified to skip before the flow proceeds to block 411. Otherwise, the flow proceeds to block 411.


At block 411, the indexing process determines whether any of the packets stored in the slots of the shared memory remain to be classified. If so, the flow proceeds to block 412 for classification. Otherwise, the flow returns to block 402 where the processing unit 305 continues to receive the packet stream of network traffic from the network 301.


At block 412, after the indexing process determines that packets stored in the slots of the shared memory remain to be classified, the indexing process groups packet data in the indexing database 304 according to classification of the packets. The classification may include deep packet inspection, header evaluation, and so on. The flow then proceeds to block 413 where the indexing process indexes the indexing database to point to locations of packet data in the packet capture repository in the storage 303. The flow then returns to block 402 where the processing unit 305 continues to receive the packet stream of network traffic from the network 301.



FIG. 5 illustrates storing of packet data in a packet capture repository 502. The illustrated storing of packet data may be performed by the capture appliance 302 of FIG. 3. Referring again to FIG. 5, packetized data may be identified in a flow of packets 501 crossing a network and the identified packet data may be stored in the packet capture repository. In some implementations, all packets flowing through a particular point in a network, such as at the location of a network tap, may be stored in the packet capture repository. Practically speaking, some packets may be lost or dropped due to various issues including delivery failure or practical limits of computing technology, but the system attempts to capture every packet.


The packets 501 may include a data unit (e.g., packets of data of an email, an instant message communication, an audio file, a compressed file, etc.) that may be carried by a flow of the packets in the network. The packet capture repository may contain a collection of packets whose contents might fall into a variety of classes such as software applications to which the packet data relates. By way of example, FIG. 5 illustrates that the packet capture repository contains collections of packets whose contents are related to a World Wide Web (WWW) application 503 (such as a web browser) and an Instant Messaging (IM) application 504 (such as the Facebook™ instant messenger client).



FIG. 6 is a diagram illustrating an indexing database 601 that includes indices to packets contained within a packet capture repository. The illustrated indexing database 601 may be the indexing database 304 of FIG. 3. Referring again to FIG. 6, the indexing database 601 may be a collection of meta-data that is stored in an organized manner so that the data packets may be accessed efficiently through a query.


The information (e.g., packet data, meta-data, etc.) may be extracted from the indexing database 601 through a suitable database query. The database query may be performed through any number of interfaces including a graphical user interface, a web services request, a programmatic request, a structured query language (SQL), and so on, which is used to extract related information of a packet data or any meta-data stored in the indexing database. If queried packet data/information is matched with the data stored in the indexing database, then packets matching the query may be retrieved from an associated packet repository for reconstruction.


The matched packet data may be reconstructed by referring to a memory location corresponding to designated packet data. The indexing database may point to members of a collection of data packets according to “class,” where class may include any data such as software applications to which the packets relate, attributes of a packet header, the presence of a multi media file flowing across the network, a session of a particular user of the network at a particular point in time, and so on. The pointers may point to the memory location of packets stored in the packet capture repository for the purpose of efficient retrieval of relevant packets. The indexing database may point to packets according to their having been classified as containing applications, files, and other data shared through the network in the native packetized format in which it was transmitted. Also, the sessions of each individual user in the network may be stored in the indexing database. Sessions may be grouped and stored in the database.


For example, the indexing database 601 may include indexed WWW data 602, indexed TCP session data 603, indexed data for a particular user's TCP session 604, indexed IM data 605, and so on. Each index 602, 603, 604, and 605 may be a unit of the indexing database. In addition, the indexing database may include pointers pointing to a packet capture repository location of particular information corresponding to an index.


For example, a first pointer 606 may point to a first packet capture repository location 610 within the packet capture repository to represent the contents stored in a particular location of the indexed WWW data 602. A second pointer 607 may point to a second packet capture repository location 611 within the packet capture repository to represent the contents stored in a particular location of the indexed TCP session data 603. A third pointer 609 may point to a third packet capture repository location 612 within the packet capture repository to represent the contents stored in a particular location of the indexed IM data 605. A fourth pointer 608 may point to a fourth packet capture repository location 613 within the packet capture repository to represent the contents stored in a particular location of the indexed data for particular user's TCP session 604, and so on.


In the present disclosure, the methods disclosed may be implemented as sets of instructions or software readable by a device. Further, it is understood that the specific order or hierarchy of steps in the methods disclosed are examples of sample approaches. In other embodiments, the specific order or hierarchy of steps in the method can be rearranged while remaining within the disclosed subject matter. The accompanying method claims present elements of the various steps in a sample order, and are not necessarily meant to be limited to the specific order or hierarchy presented.


The described disclosure may be provided as a computer program product, or software, that may include a non-transitory machine-readable medium having stored thereon instructions, which may be used to program a computer system (or other electronic devices) to perform a process according to the present disclosure. A non-transitory machine-readable medium includes any mechanism for storing information in a form (e.g., software, processing application) readable by a machine (e.g., a computer). The non-transitory machine-readable medium may take the form of, but is not limited to, a: magnetic storage medium (e.g., floppy diskette); optical storage medium (e.g., CD-ROM); magneto-optical storage medium; read only memory (ROM); random access memory (RAM); erasable programmable memory (e.g., EPROM and EEPROM); flash memory; and so on.


It is believed that the present disclosure and many of its attendant advantages will be understood by the foregoing description, and it will be apparent that various changes may be made in the form, construction and arrangement of the components without departing from the disclosed subject matter or without sacrificing all of its material advantages. The form described is merely explanatory, and it is the intention of the following claims to encompass and include such changes.


While the present disclosure has been described with reference to various implementations, it will be understood that these implementations are illustrative and that the scope of the disclosure is not limited to them. Many variations, modifications, additions, and improvements are possible. More generally, implementations in accordance with the present disclosure have been described in the context of particular embodiments. Functionality may be separated or combined in blocks differently in various embodiments of the disclosure or described with different terminology. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure as defined in the claims that follow.

Claims
  • 1. A system, comprising: a shared memory that includes a plurality of slots to transiently store network data packets;a packet capture repository utilizing a non-transitory storage medium;an indexing database utilizing a non-transitory storage medium;a pattern matching processing unit to generate preclassification data for the network data packets utilizing pattern matching analysis, wherein the pattern matching processing unit includes a graphical processing unit with multiple cores to analyze multiple network data packets in parallel; andat least one processing unit that implements: a storage process that receives the network data packets, stores the network data packets in at least one of the slots, and transfers the network data packets to the packet capture repository when the slots in the shared memory are full;a preclassification process that request from the pattern matching processing unit the preclassification data; andan indexing process to: determine, based upon the preclassification data, whether to invoke or omit additional analysis of the network data packets, such that the indexing process resources are dedicated to further analyzing network data packets of greater concern, andperform at least one of aggregation, classification, or annotation of the network data packets in the shared memory to maintain one or more indices in the indexing database.
  • 2. The system of claim 1 wherein the preclassification process normalizes data within the network data packets.
  • 3. The system of claim 1 wherein the pattern matching analysis is Aho-Corasick string matching.
  • 4. The system of claim 1 wherein the pattern matching analysis is selected from identifying the application to which the network data packets relate, identifying the protocol utilized to transmit the network data packets, identifying file types of payload data in the network data packets, identifying source or destination addresses associated with the network data packets and identifying the lengths of network data packets.
  • 5. The system of claim 1 wherein copyless direct memory access data transfers are used between the plurality of slots.
  • 6. The system of claim 1 wherein the additional analysis determines the names and types of files being shared.
  • 7. The system of claim 1 wherein the additional analysis includes identifying and indexing protocol-specific attributes.
  • 8. The system of claim 1 wherein the additional analysis includes identifying and indexing web-based applications.
  • 9. The system of claim 8 wherein identifying and indexing web-based applications includes identifying and indexing a particular HTTP session.
CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent Application No. 61/453,456, filed Mar. 16, 2011, entitled, “Hardware Accelerated Application-Based Pattern Matching for Real Time Classification and Recording of Network Traffic”, the contents of which are incorporated herein by reference.

US Referenced Citations (243)
Number Name Date Kind
5274643 Fisk Dec 1993 A
5440719 Hanes et al. Aug 1995 A
5526283 Hershey et al. Jun 1996 A
5602830 Fichou et al. Feb 1997 A
5758178 Lesartre May 1998 A
6041053 Douceur et al. Mar 2000 A
6101543 Alden et al. Aug 2000 A
6145108 Ketseoglou Nov 2000 A
6185568 Douceur et al. Feb 2001 B1
6336117 Massarani Jan 2002 B1
6370622 Chiou et al. Apr 2002 B1
6400681 Bertin et al. Jun 2002 B1
6453345 Trcka et al. Sep 2002 B2
6516380 Kenchammana-Hoskote et al. Feb 2003 B2
6522629 Anderson, Sr. Feb 2003 B1
6591299 Riddle et al. Jul 2003 B2
6628617 Karol et al. Sep 2003 B1
6628652 Chrin et al. Sep 2003 B1
6675218 Mahler et al. Jan 2004 B1
6693909 Mo et al. Feb 2004 B1
6708292 Mangasarian Mar 2004 B1
6754202 Sun et al. Jun 2004 B1
6782444 Vishlitzky et al. Aug 2004 B1
6789125 Aviani et al. Sep 2004 B1
6907468 Moberg et al. Jun 2005 B1
6907520 Parady Jun 2005 B2
6928471 Pabari et al. Aug 2005 B2
6956820 Zhu et al. Oct 2005 B2
6958998 Shorey Oct 2005 B2
6993037 Boden et al. Jan 2006 B2
6999454 Crump Feb 2006 B1
7002926 Eneboe et al. Feb 2006 B1
7024609 Wolfgang et al. Apr 2006 B2
7028335 Borella et al. Apr 2006 B1
7032242 Grabelsky et al. Apr 2006 B1
7039018 Singh et al. May 2006 B2
7047297 Huntington et al. May 2006 B2
7058015 Wetherall et al. Jun 2006 B1
7061874 Merugu et al. Jun 2006 B2
7065482 Shorey et al. Jun 2006 B2
7072296 Turner et al. Jul 2006 B2
7075927 Mo et al. Jul 2006 B2
7116643 Huang et al. Oct 2006 B2
7126944 Rangarajan et al. Oct 2006 B2
7126954 Fang et al. Oct 2006 B2
7142507 Kurimoto et al. Nov 2006 B1
7145906 Fenner Dec 2006 B2
7151751 Tagami et al. Dec 2006 B2
7154896 Kim et al. Dec 2006 B1
7162649 Ide et al. Jan 2007 B1
7168078 Bar et al. Jan 2007 B2
7200122 Goringe et al. Apr 2007 B2
7203173 Bonney et al. Apr 2007 B2
7218632 Bechtolsheim et al. May 2007 B1
7237264 Graham et al. Jun 2007 B1
7240166 Ganfield Jul 2007 B2
7254562 Hsu et al. Aug 2007 B2
7269171 Poon et al. Sep 2007 B2
7274691 Rogers et al. Sep 2007 B2
7277399 Hughes, Jr. Oct 2007 B1
7283478 Barsheshet et al. Oct 2007 B2
7292591 Parker et al. Nov 2007 B2
7330888 Storry et al. Feb 2008 B2
7340776 Zobel et al. Mar 2008 B2
7359930 Jackson et al. Apr 2008 B2
7376731 Khan et al. May 2008 B2
7376969 Njemanze et al. May 2008 B1
7379426 Sekiguchi May 2008 B2
7385924 Riddle Jun 2008 B1
7386473 Blumenau Jun 2008 B2
7391769 Rajkumar et al. Jun 2008 B2
7406516 Davis et al. Jul 2008 B2
7408938 Chou et al. Aug 2008 B1
7418006 Damphier et al. Aug 2008 B2
7420992 Fang et al. Sep 2008 B1
7423979 Martin Sep 2008 B2
7433326 Desai et al. Oct 2008 B2
7440464 Kovacs Oct 2008 B2
7441267 Elliott Oct 2008 B1
7444679 Tarquini et al. Oct 2008 B2
7450560 Grabelsky et al. Nov 2008 B1
7450937 Claudatos et al. Nov 2008 B1
7453804 Feroz et al. Nov 2008 B1
7457277 Sharma et al. Nov 2008 B1
7457296 Kounavis et al. Nov 2008 B2
7457870 Lownsbrough et al. Nov 2008 B1
7466694 Xu et al. Dec 2008 B2
7467202 Savchuk Dec 2008 B2
7480238 Funk et al. Jan 2009 B2
7480255 Bettink Jan 2009 B2
7483424 Jain et al. Jan 2009 B2
7489635 Evans et al. Feb 2009 B2
7493654 Bantz et al. Feb 2009 B2
7496036 Olshefski Feb 2009 B2
7496097 Rao et al. Feb 2009 B2
7499590 Seeber Mar 2009 B2
7508764 Back et al. Mar 2009 B2
7512078 Swain et al. Mar 2009 B2
7512081 Ayyagari et al. Mar 2009 B2
7522521 Bettink et al. Apr 2009 B2
7522594 Piche et al. Apr 2009 B2
7522599 Aggarwal et al. Apr 2009 B1
7522604 Hussain et al. Apr 2009 B2
7522605 Spencer et al. Apr 2009 B2
7522613 Rotsten et al. Apr 2009 B2
7525910 Wen Apr 2009 B2
7525963 Su et al. Apr 2009 B2
7526795 Rollins Apr 2009 B2
7529196 Basu et al. May 2009 B2
7529242 Lyle May 2009 B1
7529276 Ramakrishnan May 2009 B1
7529932 Haustein et al. May 2009 B1
7529939 Bruwer May 2009 B2
7532623 Rosenzweig et al. May 2009 B2
7532624 Ikegami et al. May 2009 B2
7532633 Rijsman May 2009 B2
7532726 Fukuoka et al. May 2009 B2
7533256 Walter et al. May 2009 B2
7533267 Yoshimura May 2009 B2
7548562 Ward et al. Jun 2009 B2
7561569 Thiede Jul 2009 B2
7617314 Bansod et al. Nov 2009 B1
7684347 Merkey et al. Mar 2010 B2
7694022 Garms et al. Apr 2010 B2
7730011 Deninger et al. Jun 2010 B1
7792818 Fain et al. Sep 2010 B2
7853564 Mierau et al. Dec 2010 B2
7855974 Merkey et al. Dec 2010 B2
7881291 Grah Feb 2011 B2
7904726 Elgezabal Mar 2011 B2
8068431 Varadarajan et al. Nov 2011 B2
20010039579 Trcka et al. Nov 2001 A1
20020085507 Ku et al. Jul 2002 A1
20020089937 Venkatachary et al. Jul 2002 A1
20020091915 Parady Jul 2002 A1
20020138654 Liu et al. Sep 2002 A1
20020163913 Oh Nov 2002 A1
20020173857 Pabari et al. Nov 2002 A1
20020191549 McKinley et al. Dec 2002 A1
20030009718 Wolfgang et al. Jan 2003 A1
20030014517 Lindsay et al. Jan 2003 A1
20030028662 Rowley et al. Feb 2003 A1
20030088788 Yang May 2003 A1
20030135525 Huntington Jul 2003 A1
20030135612 Huntington et al. Jul 2003 A1
20030188106 Cohen Oct 2003 A1
20030214913 Kan et al. Nov 2003 A1
20030221003 Storry et al. Nov 2003 A1
20030231632 Haeberlen Dec 2003 A1
20030233455 Leber et al. Dec 2003 A1
20040010473 Hsu et al. Jan 2004 A1
20040078292 Blumenau Apr 2004 A1
20040100952 Boucher et al. May 2004 A1
20040103211 Jackson et al. May 2004 A1
20040218631 Ganfield Nov 2004 A1
20040260682 Herley et al. Dec 2004 A1
20050015547 Yokohata et al. Jan 2005 A1
20050050028 Rose et al. Mar 2005 A1
20050055399 Savchuk Mar 2005 A1
20050063320 Klotz et al. Mar 2005 A1
20050083844 Zhu et al. Apr 2005 A1
20050108573 Bennett et al. May 2005 A1
20050117513 Park et al. Jun 2005 A1
20050132046 de la Iglesia et al. Jun 2005 A1
20050132079 Iglesia et al. Jun 2005 A1
20050207412 Kawashima et al. Sep 2005 A1
20050229255 Gula et al. Oct 2005 A1
20050249125 Yoon et al. Nov 2005 A1
20050265248 Gallatin et al. Dec 2005 A1
20060013222 Rangan et al. Jan 2006 A1
20060037072 Rao et al. Feb 2006 A1
20060069821 P et al. Mar 2006 A1
20060083180 Baba et al. Apr 2006 A1
20060088040 Kramer et al. Apr 2006 A1
20060114842 Miyamoto et al. Jun 2006 A1
20060126665 Ward et al. Jun 2006 A1
20060146816 Jain Jul 2006 A1
20060165009 Nguyen et al. Jul 2006 A1
20060165052 Dini et al. Jul 2006 A1
20060167894 Wunner Jul 2006 A1
20060168240 Olshefski Jul 2006 A1
20060203848 Damphier et al. Sep 2006 A1
20060221967 Narayan et al. Oct 2006 A1
20060233118 Funk et al. Oct 2006 A1
20060235908 Armangau et al. Oct 2006 A1
20070019640 Thiede Jan 2007 A1
20070036156 Liu et al. Feb 2007 A1
20070038665 Kwak et al. Feb 2007 A1
20070050334 Deninger et al. Mar 2007 A1
20070050465 Canter et al. Mar 2007 A1
20070058631 Mortier et al. Mar 2007 A1
20070124276 Weissman et al. May 2007 A1
20070139231 Wallia et al. Jun 2007 A1
20070140235 Aysan et al. Jun 2007 A1
20070140295 Akaboshi Jun 2007 A1
20070147263 Liao et al. Jun 2007 A1
20070153796 Kesavan et al. Jul 2007 A1
20070157306 Elrod et al. Jul 2007 A1
20070162609 Pope et al. Jul 2007 A1
20070162971 Blom et al. Jul 2007 A1
20070223474 Shankar Sep 2007 A1
20070248029 Merkey et al. Oct 2007 A1
20070250817 Boney Oct 2007 A1
20070271372 Deninger et al. Nov 2007 A1
20070286175 Xu et al. Dec 2007 A1
20070291755 Cheng et al. Dec 2007 A1
20070291757 Dobson et al. Dec 2007 A1
20070297349 Arkin Dec 2007 A1
20080013541 Calvignac et al. Jan 2008 A1
20080037539 Paramaguru Feb 2008 A1
20080056144 Hutchinson et al. Mar 2008 A1
20080117903 Uysal May 2008 A1
20080159146 Claudatos et al. Jul 2008 A1
20080175167 Satyanarayanan et al. Jul 2008 A1
20080181245 Basso et al. Jul 2008 A1
20080240128 Elrod Oct 2008 A1
20080247313 Nath et al. Oct 2008 A1
20080279216 Sharif-Ahmadi et al. Nov 2008 A1
20080294647 Ramaswamy Nov 2008 A1
20090003363 Benco et al. Jan 2009 A1
20090006672 Blumrich et al. Jan 2009 A1
20090028161 Fullarton et al. Jan 2009 A1
20090028169 Bear et al. Jan 2009 A1
20090041039 Bear et al. Feb 2009 A1
20090073895 Morgan et al. Mar 2009 A1
20090092057 Doctor et al. Apr 2009 A1
20090097417 Asati et al. Apr 2009 A1
20090097418 Castillo et al. Apr 2009 A1
20090103531 Katis et al. Apr 2009 A1
20090109875 Kaneda et al. Apr 2009 A1
20090113217 Dolgunov et al. Apr 2009 A1
20090116403 Callanan et al. May 2009 A1
20090116470 Berggren May 2009 A1
20090119501 Petersen May 2009 A1
20090122801 Chang May 2009 A1
20090168648 Labovitz et al. Jul 2009 A1
20090182953 Merkey et al. Jul 2009 A1
20090187558 McDonald Jul 2009 A1
20090219829 Merkey et al. Sep 2009 A1
20090245114 Vijayaraghavan Oct 2009 A1
20090290580 Wood et al. Nov 2009 A1
20090292681 Wood et al. Nov 2009 A1
20100278052 Matityahu et al. Nov 2010 A1
Foreign Referenced Citations (14)
Number Date Country
0838930 Apr 1998 EP
1004185 May 2000 EP
1387527 Feb 2004 EP
1494425 Jan 2005 EP
1715627 Oct 2006 EP
1971087 Sep 2008 EP
2337903 Dec 1999 GB
2002026935 Jan 2002 JP
2002064507 Feb 2002 JP
2002323959 Nov 2002 JP
20060034581 Apr 2006 KR
WO 0223805 Mar 2002 WO
WO 2005109754 Nov 2005 WO
WO 2009038384 Mar 2009 WO
Non-Patent Literature Citations (11)
Entry
Smith, R., et al., “Evaluating GPUs for Network Packet Signature Matching”, Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium Apr. 26-28, 2009.
Berners-Lee et al., “Hypertext Transfer Protocol—HTTP/1.0”, IETF, RFC 1945, Network Working Group, 1996, 60 pgs.
Fielding et al., “Hypertext Transfer Protocol—HTTP/1.1”, Internet Engineering task Force (IETF), Request for Comments (RFC) 2616, 2068, Netowrk Working Group, 1999, 165 pgs.
Hamill et al., “Petaminer: Efficient Navigation to Petascale Data Using Event-Level Metadata”, Proceedings of XII Advanced Computing and Analysis Techniques in Physics Research, Nov. 2008, pp. 1-5.
Huang et al., PlanetFlow: Maintaining Accountability for Network Services, ACM SIGOPS Operating Systems Review 40, 1 (Jan 2006), 6 pgs.
Kim, Kwang Sik (Exr), International Search Report issued to application No. PCT/US09/41061, Nov. 26, 2009, 8 pgs.
Kim, Sae Young (Exr), International Search Report issued to application No. PCT/US09/41060, Dec. 3, 2009, 8 pgs.
Kwon, Oh Seong (Exr), International Search Report issued to application No. PCT/US09/40733, Nov. 24, 2009, 3 pgs.
Lupia, Sergio (Exr), International Search Report issued to application No. PCT/US10/56739, Apr. 1, 2011, 15 pgs.
Lupia, Sergio (Exr), International Search Report issued to application No. PCT/US10/56723, Apr. 20, 2011, 14 pgs.
Stockinger, “Network Traffic Analysis with Query Driven Visualization—SC2005 HPC Analytics Results”, Proceedings of the 2005 ACM/IEEE SC/05 Conference, Nov. 2005, 4 pgs.
Related Publications (1)
Number Date Country
20120239652 A1 Sep 2012 US
Provisional Applications (1)
Number Date Country
61453456 Mar 2011 US