This patent document generally relates to system security approaches, especially a multi-level data processing system that can be employed in such system security approaches.
Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Some of the most malicious software, also known as malware, capable of crippling a computing device or even an entire corporate network, are being distributed worldwide via electronic mail (“email”) and email attachments. As individuals and businesses become increasingly dependent on email communications, the likelihood of such programs setting off disruptive consequences has also increased considerably. Further complicating the matter is that some email attachments are compressed to conserve communication bandwidth. Finding malware in such compressed attachments generally involves decompressing the entire compressed attachments before scanning the uncompressed version of the attachments.
One approach employed by existing anti-virus solutions is to filter out an attachment file based on its extension. Thus, if the attachment file has a known compression extension, such as zip, then the attachment file is blocked from reaching users of such solutions. However, since this approach does not inspect the content of the attachment file, a legitimate and a malware-free attachment file may be erroneously filtered out.
Another approach employed by the anti-virus solutions is to recommend or even require a user of the solutions to decompress and scan the compressed attachment file for malware prior to permitting the user to access the file. After an affirmative act by the user, such as manually electing to start the decompressing and scanning process, the entire attachment file is temporarily stored either on the user's computing device or on the mail server on the network for processing. Unlike the first approach discussed above, this approach inspects the content of the attachment file. However, the inspection takes place only after the entire file is stored and decompressed. By its nature, a compressed file tends to contain a large amount of information when it is in its uncompressed state. Since the entire uncompressed file is stored and inspected, this approach consumes significant processing and memory resources. When faced with multiple attachments from different email sessions concurrently, the resource requirements of this approach renders the implementation of the approach impractical and prohibitively expensive.
As the foregoing illustrates, what is needed is a way to efficiently and yet thoroughly inspect the content of these compressed attachment files in email communications.
Methods and systems for processing multiple levels of data in system security approaches are disclosed. In one embodiment, a first set of resources is selected to iteratively reverse multiple levels of format conversions on the payload data of a data unit. This data unit is part of a first file, which is associated with a first transport connection. Independently, a second set of resources is also selected to iteratively reverse multiple levels of format conversions on the payload data of a data unit. This data unit is part of a second file, which is associated with a second transport connection. Upon completion of the aforementioned reversal operations, the payload data of a first reversed data unit and a second reversed data unit, which correspond to the data unit of the first file and the data unit of the second file, respectively, are inspected for suspicious patterns. The inspection of the first and the second reversed data units occur prior to any aggregation of the data units of the first file or the second file.
Methods and systems for processing multiple levels of data in system security approaches are described. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details.
Certain computing and programming theories and networking protocols are well known in the art and will not be elaborated in detail. However, throughout this disclosure, any two data processing operations are said to be “in parallel,” when at least some portions of the operations are performed at the same time. Each “data unit” generally refers to data that are stored in a particular memory location or a packet with a destination address. The reversal of one format conversion of a data unit is referred to as a single “level” data processing. So, if a data unit has been encoded and also compressed, then a two-level data processing, namely decompressing and decoding, is needed to reverse the two format conversions. “Data-unit-based” processing generally refers to operations performed on a single data unit without any precondition of assembling and combining multiple data units.
1.0 System Overview
A multi-level data processing system (“MDPS”) is capable of allocating resources to perform multiple data-unit-based processing in parallel. Some illustrative types of such processing include, without limitation, decoding, decompressing, unarchiving, and any reversing of a format conversion for email attachment files from multiple TCP connections in parallel on a data-unit-by-data-unit basis. For each of the email attachment files, the multi-level processing capability of the data-unit-based processing discussed above is invoked if the format of the file has been converted more than once.
One embodiment of MDPS 100 is also coupled to protocol parser 122 and content inspection engine 126. Protocol parser 122 generates and directs MDPS-data-units that characterize the data from transport service users, such as 128 and 130 shown in
Handler selector 102 is mainly responsible for interacting with resource manager 104 to track resources of MDPS 100 and designates certain MDHs, such as MDH 108 and 110, to handle the incoming data from the transport service users via protocol parser 122. In one implementation, handler selector 102 may designate one or more MDHs for each TCP connection, and the designation of a MDH spawns a process. This spawned process is referred to as a “MDH process” and is used interchangeably with MDH throughout this disclosure. Each MDH operates independently from one another and has access to a set of resources, such as one or more data processing blocks and storage. As each MDH processes the incoming data, the MDH keeps certain state information of the processing in state table 106 and also feeds certain information back to handler selector 102. After the MDH completes the processing of the incoming data, it places the results in output queue 120. Subsequent sections will provide detailed discussions of the interactions among these various components of MDPS 100.
MDPS 100 supports a finite number of MDHs. For each of the supported MDHs, resource manager 104 allocates at least a table entry in a finite sized state table 106. In one implementation, the table entry may contain identification information, status information, state information, and resource information. Specifically, the identification information may be a MDPS session number, which uniquely corresponds to a specific TCP connection and any MDH that is designated to process the data on this TCP connection. The status information indicates the availability of the designated MDH. The state information provides a snapshot of any processing the designated MDH may have undertaken. Lastly, the resource information tracks the resources the designated MDH utilizes, such as the FIFO buffers and the data processing blocks mentioned above. Subsequent paragraphs will further detail the management of state table 106 and the allocation and the de-allocation of the resources.
1.1 Protocol Parser
As an illustration, suppose one of the transport service users is a SMTP client. The data units that this SMTP client receives are thus in the form of SMTP packets. Suppose further that this SMTP client also supports Multipurpose Internet Mail Extension (“MIME”) and receives SMTP packets that are associated with the TCP 123 connection and collectively contain an email and an attachment file. For clarity of the discussions, unless otherwise indicated, references to these SMTP packets are meant to cover the packets containing the email, the email and the attachment file, or the attachment file. In this illustration, the first of these SMTP packets refers to the packet containing both the email and also a beginning portion of the attachment file. The subsequent SMTP packets refer to the packets containing the remaining portions of the attachment file. One embodiment of protocol parser 122 generates a MDPS-data-unit for each SMTP packet that it receives. In most instances, the subsequent MDPS-data-units have the same payload data as the subsequent SMTP packets.
After the TCP 123 connection is established and the required handshaking is completed pursuant to the SMTP protocol, the SMTP client begins to send the aforementioned SMTP packets to protocol parser 122. In this example, protocol parser 122 examines the payload data of the first SMTP packet in search of a boundary marker that indicates the beginning of the attachment file. At this boundary marker, protocol parser 122 retrieves certain information, such as the type of encoding for this attachment file, the existence of the attachment file, and the name of the attachment file. Based on the retrieved information, protocol parser 122 determines a data type for this first portion of the attachment file. So, if the type of encoding is base64, then protocol parser 122 denotes the data type of this first portion in step 202 to represent base64 encoding.
With at least the TCP 123 connection and the denoted data type information, protocol parser 122 requests MDPS 100 shown in
Continuing with the aforementioned example, because the subsequent portions of the same attachment file belong to the same TCP connection and are likely to remain as base64 encoded, for each of the SMTP packets that contain these subsequent portions, protocol parser 122 generates a corresponding MDPS-data-unit with the same MDPS session ID and the same data type as the MDPS-data-unit for the first portion of the attachment file. In addition, protocol parser 122 examines each SMTP packet for a boundary marker that indicates the end of the attachment file. If the boundary marker is found and the last SMTP packet containing the remaining portion of the attachment file is identified in step 212, then protocol parser 122 initiates the closing of the MDPS session in step 214. On the other hand, if the boundary marker is not found, then protocol parser 122 continues to generate and send MDPS-data-units to MDPS 100 in step 210.
Protocol parser 122 is capable of handling different TCP connections in parallel. In other words, multiple instances of process 200 as shown in
It should be noted that protocol parser 122 may not be able to precisely determine the data type of a data unit in certain situations. In such situations, data type subfield 258 as shown in
1.2 Multi-Level Data Handler and Resource Management
To further describe the multi-level operations of MDH 108,
In conjunction with
Following the same process discussed above, MDH 108 continues to invoke single-level data handlers for different levels of processing, if MDH 108 continues to identify distinct data types and has not exceeded a threshold number of iterations of extracting a data type and utilizing appropriate resources for such data type. Thus, prior to the completion of the PKUNZIP operation on the first MDPS-data-unit, MDH 108 increments level variable n by 1 and invokes reversal operation type C (not shown in
2.0 Example System Structure
Host processor 602 can either be a general purpose processor or a specific purpose processor. Some examples of a specific purpose processor are processors that are designed for, without limitation, data communications, signal processing, mobile computing, and multimedia related applications. Specific purpose processors may include interfaces that other external units, such as memory system 606 and CICP 608, can directly connect to.
CICP 608 can be implemented as an application-specific integrated circuit (“ASIC”), as software to be programmed in a programmable logic device, or even as a functional unit in a system-on-chip (“SOC”). One or more of the components illustrated in
High speed I/O bridge 704 manages the data-intensive pathways and supports high speed peripherals, such as, without limitation, a content inspection system that includes the aforementioned CICP 608 shown in
Server system 700 carries out the operations of the illustrated transport service users 128 and 130 and protocol parser 122 shown in
The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to processor 702 for execution. Such a medium may take many forms, including but not limited to, non-volatile media and volatile media. Non-volatile media includes, for example, optical or magnetic disks. Volatile media includes dynamic memory.
3.0 Extensions and Alternatives
In the foregoing specification, the present invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
This application is a continuation-in-part of U.S. application Ser. No. 10/868,665 filed on Jun. 14, 2004, which is incorporated herein by reference in its entirety.
Number | Name | Date | Kind |
---|---|---|---|
5832228 | Holden et al. | Nov 1998 | A |
5982891 | Ginter | Nov 1999 | A |
6487666 | Shanklin | Nov 2002 | B1 |
6598034 | Kloth | Jul 2003 | B1 |
6609205 | Bernhard | Aug 2003 | B1 |
6768726 | Dorenbosch et al. | Jul 2004 | B2 |
6792546 | Shanklin et al. | Sep 2004 | B1 |
6880087 | Carter | Apr 2005 | B1 |
7058968 | Rowland et al. | Jun 2006 | B2 |
7180895 | Smith | Feb 2007 | B2 |
7185081 | Liao | Feb 2007 | B1 |
7308715 | Gupta | Dec 2007 | B2 |
7568227 | Lyle et al. | Jul 2009 | B2 |
7596809 | Chien et al. | Sep 2009 | B2 |
20020124187 | Lyle | Sep 2002 | A1 |
20020129140 | Peled et al. | Sep 2002 | A1 |
20020171566 | Huang et al. | Nov 2002 | A1 |
20030004689 | Gupta | Jan 2003 | A1 |
20030051043 | Wyschogrod | Mar 2003 | A1 |
20030123447 | Smith | Jul 2003 | A1 |
20030221013 | Lockwood | Nov 2003 | A1 |
20040105298 | Symes | Jun 2004 | A1 |
20050055399 | Savchuk | Mar 2005 | A1 |
20050172337 | Bodorin et al. | Aug 2005 | A1 |
20050278781 | Zhao | Dec 2005 | A1 |
20060005241 | Zhao | Jan 2006 | A1 |
20060053180 | Alon et al. | Mar 2006 | A1 |
20070006300 | Zamir et al. | Jan 2007 | A1 |
Number | Date | Country |
---|---|---|
2 417 655 | Mar 2006 | GB |
589900 | Jun 2004 | TW |
Number | Date | Country | |
---|---|---|---|
20060206939 A1 | Sep 2006 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 10868665 | Jun 2004 | US |
Child | 11422087 | US |