In recent years, it has been increasingly difficult to distil an appropriate or common name for observed malware threats. For several decades, competing vendors of anti-virus or anti-malware products have pursued a diverse range of detection strategies. The competitive nature of the business has resulted in situations where malware and viruses may be assigned unique names by the first vendors to uncover the threat while other vendors, operating independently, discover and name the threat differently. In addition, with the growing use of behavioral detection systems, malware and viruses may be temporarily assigned dynamically generated descriptive names for a period of time prior to the vendor classifying the threat as either a previously known and labeled malware or virus family, or result in the creation of a new malware or virus family name.
As a consequence of the diverse and continuously changing landscape for malware and virus naming, it is often very difficult for a human to distil an appropriate or common name for an observed threat. A user's perspective and enumeration of a threat may also differ considerably depending on which vendor's antivirus products an organization employs and what third party systems they query for malware information.
Customers that use multiple antivirus products may also want to know the malware name for a few different reasons. Vendor customers may want to know what the name of the malware is so they can go to a different antivirus or malware to check to see if they have a signature that will block the particular malware. Another reason for choosing a correct name is for analysts who wish to do research on the malware.
The problem to be solved is therefore rooted in technological limitations of the legacy approaches. Improved techniques, in particular improved application of technology, are needed to address the problems that arise when the same malware and viruses are labeled different or temporary names. What is needed is a technique or techniques that effectively pools and enumerates the multitude of malware and virus names into a single human digestible and actionable framework.
The disclosed embodiments provide a system for categorizing malware into a single actionable framework. In some embodiments, the system will parse multiple vendor names and descriptive formats of a specific threat and construct a graphical representation of word or name frequency for the purpose of aiding a user in identifying the most appropriate and commonly used name for a threat.
In some embodiments, the system will query a malware database with a malware virus predicate such as a unique hash value or malware name to find malware names associated with the predicate. A malware correlator analyzes and generates families of malware threats by correlating malware data. A frequency graph constructor engine will construct a graphical representation of word or name frequency, this permits a user to visually identify the most appropriate and commonly used names for a threat.
In some embodiments, the system will query a malware database with malware network behavior to find unique hash values associated with the malware network behavior predicate.
Other additional objects, features, and advantages of the invention are described in the detailed description, figures, and claims.
The drawings illustrate the design and utility of some embodiments of the present invention. It should be noted that the figures are not drawn to scale and that elements of similar structures or functions are represented by like reference numerals throughout the figures. In order to better appreciate how to obtain the above-recited and other advantages and objects of various embodiments of the invention, a more detailed description of the present inventions briefly described above will be rendered by reference to specific embodiments thereof, which are illustrated in the accompanying drawings. Understanding that these drawings depict only typical embodiments of the invention and are not therefore to be considered limiting of its scope, the invention will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:
The present invention is directed to a method, system, and computer program product for categorizing malware threats. Other objects, features, and advantages of the invention are described in the detailed description, figures, and claims.
Various embodiments of the methods, systems, and articles of manufacture will now be described in detail with reference to the drawings, which are provided as illustrative examples of the invention so as to enable those skilled in the art to practice the invention. Notably, the figures and the examples below are not meant to limit the scope of the present invention. Where certain elements of the present invention can be partially or fully implemented using known components (or methods or processes), only those portions of such known components (or methods or processes) that are necessary for an understanding of the present invention will be described, and the detailed descriptions of other portions of such known components (or methods or processes) will be omitted so as not to obscure the invention. Further, the present invention encompasses present and future known equivalents to the components referred to herein by way of illustration.
Before describing the examples illustratively depicted in the figures, a general introduction is provided for better understanding. In some embodiments, a malware correlator system may be implemented to pool and enumerate the multitude of malware and virus names into a single human digestible and actionable framework for the purpose of aiding a user in identifying the most appropriate and commonly used name for a malware threat. In some embodiments, a malware correlator system may parse multiple vendor names and descriptive formats of a specific threat. In some embodiments, a malware database will collect multiple malware names or employ third-party systems for malware vendor names and information. In some embodiments, the malware correlator system will output a malware name frequency graph (e.g., word cloud). The term “virus” and “malware” are used interchangeability throughout this specification.
In some embodiments, the system gathers raw data for malware analyzed independently by different antivirus products (e.g., 102a, 102b, 102c, and 102d) into the Malware Database 110. Antivirus products work independently, so the antivirus products may assign unique names when they discover a new malware sample. This results in multiple product having different names for the same virus depending on what antivirus product a user uses. The system can also query a third-party system for malware raw data. As such, depending on which vendor's products an organization employs and what third-party systems they query for malware raw data their perspective and enumeration of a malware can differ considerably.
In some embodiments, if the Malware Database 110 contains malware names associated with the malware virus predicate, the Malware Correlator 130 may determine and generate families of malware threats corresponding to the malware virus predicate. In some embodiments, the Frequency Graph Constructor Engine 140 takes the correlated malware data and constructs a graphical representation of word or name frequency.
In some embodiments, a user computer 104a may be used to control the Malware Correlator 130 and Frequency Graph Constructor Engine 140. The user computer 104a comprises a display device, such a display monitor, for displaying a user interface to users at the user station. The user station 104a also comprises one or more input devices for the user to provide operational control over the activities of the system 100, such as a mouse of keyboard to manipulate a pointing object to generate user inputs to the system 100.
After the Frequency Graph Constructor Engine 140 operates on the correlated malware, the user computer 104a may request the Frequency Graph Constructor Engine 140 to generate Frequency Display Data 150. The Frequency Graph Constructor Engine 140 generates the content that is visually displayed to the user at user station 104a. This content includes, for example, the frequency graph shown in
At 205, the list of malware names resulting from the query will be collected by the Malware Correlator 130. Once collected, the Malware Correlator 130 will correlate and generate families of malware threats. At 207, the Malware Correlator 130 is controlled by user control signals from the user computer 104 to generate a family or families of malware threats by correlating the collected malware data. In some embodiments, the list of correlated malware or family or families of malware threats can be stored into a database in a computer readable storage device. The computer readable storage device comprises any combination of hardware and software that allows for ready access to the data that is located at the computer readable storage device. For example, the computer readable storage device could be implemented as computer memory operatively managed by an operating system. The computer readable storage device could also be implemented as an electronic database system having storage on persistent and/or non-persistent storage.
At 209, once the malware has been correlated, the Frequency Graph Constructor Engine 140 constructs a graphical representation of word or name frequency for the purpose of aiding a user in identifying the most appropriate and commonly used name for a threat. In some embodiments, the Frequency Graph Constructor Engine 140 is controlled by user control signals from the user computer 104a. In some embodiments, the user may want to construct a frequency graph or a “word cloud” graph to aid in quickly identifying the most appropriate and commonly used names for the threat. For example, a user may want to use a word cloud graph to visually reveal which malware names are more frequently used without understanding the technicalities of how a family of malware threats was generated.
At 211, the Frequency Graph Constructor Engine 140 will generate a Frequency Display Data 150 for display to the user on the User Computer 104a.
In other embodiments, the raw data for malware is manually collected from different antivirus products into a Malware Database 110.
At 501, the system gathers raw data for malware analyzed independently by multiple antivirus products into a malware database. Given the malware network behavior predicate, a computing process identifies malware and viruses that have been previously observed to utilize or rely upon those same Internet addresses. At 503, the system queries Malware Database 110 with the malware network behavior predicate to find unique malware hash values associated with the same malware network behavior predicate. The Malware Correlator 130 collects unique malware hash values resulting from the database query at 505. In some embodiments, the list of collected unique malware hash values is stored into a database in a computer readable storage device. This list may comprise of multiple malware hashes associated with the malware network behavior predicate (e.g., IP address or domain name) over an extended period of time. The period of time may be pre-defined to limit query size and the volume of any returned results.
At 507, the Malware Correlator 130 will query Malware Database 110 again, but this time the query will be with a unique malware hash value to find unique malware names associated with a unique malware hash value collected from the same malware virus behavior predicate. This process is described in more detail in
At 511, Frequency Graph Constructor Engine 140 will construct a frequency graph of correlated threats based on user control signals. At 513, the Frequency Graph Constructor Engine 140 will generate a user interface Frequency Display Data 150. The Frequency Graph Constructor Engine 140 generates the content that is visually displayed to the user at user station 104a. This content includes, for example, the frequency graph shown in
According to some other embodiments, the malware virus predicate may correspond to a malware's destination domain name or peer-to-peer network behavior.
At 705, the user queries a malware database with the unique malware hash value to extract a list of malware names for a unique hash value. A single malware hash may yield multiple malware and virus names from multiple anti-virus and anti-malware vendors. At 707, the malware name resulting from the query are collected in the malware correlator. At 709, the malware naming module determines whether a unique malware hash value has been queried to extract the list of malware names for that unique hash value. If not, then the user queries malware database with any unique malware hash value that has not been queried at 711. If yes, the malware correlator has collected malware names for a unique hash value from a queryable source and is ready to generate families of malware threats at 713.
The Frequency Graph Constructor Engine 140 will extract a list of malware names for each unique hash value. Next, the Frequency Graph Constructor Engine 140 will receive a request from the user computer 104a to generate a Frequency Display Data 150. The Frequency Graph Constructor Engine 140 will then construct a Frequency Display Data 150 that corresponds to a frequency graph or a “word cloud” graph identifying the most appropriate and commonly used name for the threat. The Frequency Graph Constructor Engine 140 will then send the Frequency Display Data 150 for display to the user on the user computer 104a.
As noted above, the way the terms are displayed in the user interface correlates to the frequency of the malware names. For example, the malware names corresponding to a relatively higher frequency number will have a relatively bigger font size, whereas the terms corresponding to a relatively lower frequency number will have a relatively smaller font size.
According to one embodiment of the invention, computer system 1400 performs specific operations by processor 1407 executing one or more sequences of one or more instructions contained in system memory 1408. Such instructions may be read into system memory 1408 from another computer readable/usable medium, such as static storage device 1409 or disk drive 1410. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and/or software. In one embodiment, the term “logic” shall mean any combination of software or hardware that is used to implement all or part of the invention.
The term “computer readable medium” or “computer usable medium” as used herein refers to any tangible medium that participates in providing instructions to processor 1407 for execution. Such a medium may take many forms, including but not limited to, non-volatile media and volatile media. Non-volatile media includes, for example, optical or magnetic disks, such as disk drive 1410. Volatile media includes dynamic memory, such as system memory 1408. A data interface 1433 may be provided to interface with medium 1431 having a database 1432 stored therein.
Common forms of computer readable media include, for example, floppy disk, flexible disk, hard disk, magnetic tape, any other magnetic medium, CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, RAM, PROM, EPROM, FLASH-EPROM, any other memory chip or cartridge, or any other medium from which a computer can read.
In an embodiment of the invention, execution of the sequences of instructions to practice the invention is performed by a single computer system 1400. According to other embodiments of the invention, two or more computer systems 1400 coupled by communication link 1415 (e.g., LAN, PTSN, or wireless network) may perform the sequence of instructions required to practice the invention in coordination with one another.
Computer system 1400 may transmit and receive messages, data, and instructions, including program, i.e., application code, through communication link 1415 and communication interface 1414. Received program code may be executed by processor 1407 as it is received, and/or stored in disk drive 1410, or other non-volatile storage for later execution.
In the foregoing specification, the invention has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention. For example, the above-described process flows are described with reference to a particular ordering of process actions. However, the ordering of many of the described process actions may be changed without affecting the scope or operation of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense.
This application claims the benefit of priority to U.S. Provisional Application No. 62/413,374, filed on Oct. 26, 2016, which is hereby incorporated by reference for all purposes in its entirety.
Number | Date | Country | |
---|---|---|---|
62413374 | Oct 2016 | US |