1. Field of the Invention
The present invention is related to anti-malware technology, and more particularly, to detection and minimization of false positives occurring during anti-malware processing.
2. Description of the Related Art
Detection of viruses and malware has been a concern throughout the era of the personal computer. With the growth of communication networks such as the Internet and increasing interchange of data, including the rapid growth in the use of e-mail for communications, infection of computers through communications or file exchanges is an increasingly significant consideration. Infections take various forms, but are typically related to computer viruses, Trojan programs or other forms of malicious code (i.e., malware).
Recent incidents of e-mail mediated virus attacks have been dramatic both for the speed of propagation and for the extent of damage, with Internet service providers (ISPs) and companies suffering service problems and a loss of e-mail capability. In many instances, attempts to adequately prevent file exchange or e-mail mediated infections significantly inconvenience computer users. Improved strategies for detecting and dealing with virus attacks are desired.
One conventional approach to detecting viruses is signature scanning. Signature scanning systems use sample code patterns extracted from the known malware code and scan for the occurrence of these patterns in other program codes. A primary limitation of the signature scanning method is that only the known malicious code is detected, that is, only the code that matches the stored sample signatures of known malicious code is identified as being infected. All viruses or a malicious code not previously identified, and all viruses or a malicious code created after the last update to the signature database will not be detected.
In addition, the signature analysis technique fails to identify the presence of a virus if the signature is not aligned in the code in the expected fashion. Alternatively, the authors of a virus may obscure the identity of the virus by an opcode substitution or by inserting dummy or random code into virus functions. A nonsense code that alters the signature of the virus to a sufficient extent as to be undetectable by a signature scanning program without diminishing the ability of the virus to propagate and deliver its payload.
Another virus detection strategy is integrity checking. Integrity checking systems extract a code sample from the known, benign application program code. The code sample is stored together with the information from the program file, such as the executable program header and the file length, as well as the date and the time stamp of the sample. The program file is checked at regular intervals against this database to ensure that the program file has not been modified.
Integrity checking programs generate long lists of modified files when a user upgrades the operating system of the computer or installs or upgrades the application software. The main disadvantage of an integrity check-based virus detection system is that a great many warnings of virus activity issue whenever any modification of an application program is performed. It becomes difficult for a user to determine when a warning may represent a legitimate attack on the computer system.
Checksum monitoring systems (and generally, control sum or hash monitoring systems) detect viruses by generating a cyclic redundancy check (CRC) value for each program file. Modification of the program file is detected by a difference in the CRC value. Checksum monitors improve integrity check systems by the fact that the malicious code can hardly defeat the monitoring. On the other hand, checksum monitors exhibit the same limitations as integrity checking systems issuing too many false warnings, and to identify which warnings represent actual viruses or infection gets difficult.
An effective conventional approach uses the so-called white lists, i.e. the lists of known “clean” software components, links, libraries and other clean objects. In order to compare a suspect object against the white list, hash values can be used. The use of hashes is disclosed, for example, in WO/2007066333 where the white list consists of hashes of known clean applications. In WO/2007066333, checksums are calculated and compared against the known checksums.
To be effective, the white lists have to be constantly updated as disclosed, for example, in US 2008/0168558, which uses ISP for white list updates. In the US 2008/0104186, the white list is updated using some information derived from the content of a message. Also, in US 2007/0083757, it is determined whether a white list needs to be corrected and the last version of a white list is retrieved if correction is required.
When white lists are used, some false positive determinations are inevitably made. The false positives must be detected, as they can cause perhaps almost as much harm as a malware. For example, a legitimate component can be “recognized” by the AV software to be malware, causing severe damage to the reputation of the AV software vendor, and annoyance and wasted time for many users. Another scenario occurs when a malware is misconsidered to be a “clean” component which harms the system. Currently, false positives are detected and the white lists are corrected manually. It takes a relatively long time, often many hours, and sometimes as long as a day or two, since the process is to a large degree manual, requiring an analyst's participation, which does not prevent from occurrences of the same false positive for many users, before the white lists are updated and then distributed.
Detection of false positives is disclosed in US 2008/0168558, where false positives are detected by comparison of various threat reports. A security system which takes into consideration the values of false positives is disclosed in WO/03077071.
However, conventional systems do not provide an effective and robust update of the white lists based on detected false positives. For example, in US 2006/0206935, minimization of risk false positives is discussed, but how to correct the white lists is not suggested.
In WO/9909507, neural networks are used for minimization of false positives. However, this reference does not cover correction of the white lists. In US2007/0220043, parameters such as vendor, product version and product name are used for estimation of a potential threat. However, these parameters are not used for correction of a white list. Other conventional systems use software license information for including the software into a white list.
In other systems digital signatures are used for placing an object into a white or into a black (i.e., malware) list. For example, in the WO/2007143394, a digital signature is included in the white list. However, correction and update of the white list is not disclosed either.
It is apparent that improved techniques for maintaining, correcting and updating the white lists and the black lists are desired. Accordingly, there is a need in the art for a system and method that addresses the need for detection and minimization of false positives occurring during anti-malware processing.
The present invention is intended as a method and system for detection and minimization of false positives occurring during anti-malware processing, that substantially obviates one or several of the disadvantages of the related art.
In one aspect of the invention there is provided a system, method and computer program product for detection of false positive occurring during execution of anti-malware applications. According to an exemplary embodiment, the detection and correction of the false positives is implemented in two phases: before creation of new anti-virus databases (i.e., malware black lists) and after the anti-virus database is created and new false positives are detected.
The system calculates a probability of detection of a potential malware object. Based on this probability, the system decides to either correct a white list (i.e., a collection of known clean objects) or update a black list (i.e., a collection of known malware objects).
In one embodiment, a process can be separated into several steps: creation and update or correction of white lists; creation and update of black lists; detection of collisions between these lists and correction of white lists and black lists based on the collisions.
Additional features and advantages of the invention will be set forth in the description that follows, and in part will be apparent from the description, or may be learned by practice of the invention. The advantages of the invention will be realized and attained by the structure pointed out in the written description and claims hereof as well as the appended drawings.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
In the drawings:
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings.
According to the exemplary embodiment, a method, system and computer program product for detection of false positives and correction of a white list is provided. The proposed method and system also provide for timely update of black lists (i.e., AV databases).
In one aspect of the invention there is a system, method and computer program product for detection of false positive occurring during execution of anti-malware applications. According to the exemplary embodiment, the detection and correction of the false positives is implemented in two phases—before creation of new anti-virus databases (i.e., malware black lists) and after the anti-virus database is created and new false positives are detected. The system calculates a probability of detection of a certain potential malware object. Based on this probability, the system decides to either correct a white list (i.e., a collection of known clean objects) or update a black list (i.e., a collection of known malware objects).
In one embodiment, a process is separated into several steps: creation and update or correction of white lists; creation and update of black lists; detection of collisions between these lists; and correction of white lists based on the detected collisions. A statistical system of the exemplary embodiment provides collision analyses and corrects white and black lists.
In one embodiment, a method for a white list creation is provided. In another embodiment a method for adding applications (or executable components) to a white list is based on various parameters. According to the exemplary embodiment, a white list is created employing several methods based on hash values, on license agreement of an application, on a digital signature, on a context of an object, based on user statistics and other criteria.
The white list is created for file objects, links, mail messages and their senders, as well as for other objects, such as instant messenger accounts and logs and addresses, host names, IP addresses, domain names, identifiers of advertising vendors (e.g., Google Adsense). The listed methods for filling up a white list are illustrated in
In particular, it is difficult to analyze applications using hash value calculations 110, since the hash value changes for each new version of a released application. Use of a license agreement 140 can also be problematic, since it is not always available and developers approach it different ways. Note that while the hash values 110 are usually calculated using MD5 algorithm, they can be calculated using any hash function, e.g., MD4, SHA1, SHA2, SHA256, etc.
The user statistics 150 can help to fill up the white list. For example, if an unknown object 100 (i.e., an application or an executable component) has been marked by a majority of users as “clean,” it can be, at least temporarily, considered clean and added to the white list.
An object can also be added to the white list based on its behavior 160 within a system. For example, when execution of a file “photoshop.exe” is detected, by its launch from C:\ProgramFiles\Adobe Photoshop creating a link to a server using a URL White List and not exhibiting any of malware properties, the system can add file identification data into a list File White List.
An object can also be placed into the white list based on its context 120 as illustrated in
For example, a newly released version of MS Internet Explorer can be placed into the white list, provided that its previous version was white listed under the equal circumstances. It will be appreciated that the decision is essentially probabilistic, usually based on several criteria, such as whether the previous version was white listed, a context, a vendor name, a source of file, environmental variables and other factors.
Another example of using context 120 can be placing an object into white list on account of the information about its distribution source. For example, it can be applicable to software downloaded from the trusted Internet nodes (URL White List).
Another exemplary embodiment is directed to detecting collisions between the white list and the black list. For example, an object is determined to be a malware component and is subsequently blacklisted. However, this object was previously placed in the white list. If it is determined that this object was placed into the black list prematurely or by mistake (and the object is in fact “clean”), this fact is considered to be a false positive and is treated accordingly.
A variety of possible collisions between various objects of the white list and the black list are illustrated in
According to the exemplary embodiment, there are two types of collisions, —before and after generation of false positive records. Thus, false positive records can be generated and appropriate corrections can be made to either white or black lists. Alternatively, a possibility of false positives can be minimized so false positive records be not generated. Detection of recorded false positives is implemented during processing incoming statistical data. Generation of false positive records is performed at any attempt by an analyst to add a new record into a white or a black list.
An architecture used for detection of generated false positive records is illustrated in
In this case, a collision occurs when an object detected by the defense module 410 is present in the white list. All of the suspected objects (or objects' metadata) and corresponding to them AV records from a database 560 are provided to a false positive correction module 550. The module 550 calculates a probability of an error of the defense module 410 and a probability of a malware object included into the white list. Once these probabilities are analyzed, the false positive correction module 550 comes up with a verdict to either correct records in the white list database 540, or to make corrections to the AV database 560 containing black lists of known malware objects. In some complex cases, a human intervention by an expert analyst is needed to make a final verdict.
An architecture used for minimization of generation of false positive records is illustrated in
A probability of an error of the detection system 610 and a probability of a malware object included into the white list 540 are calculated in the false positive correction module 550. Upon these probability are analyzed, the false positive correction module 550 comes up with the verdict to either correct records in the white list database 540, or to make corrections to the AV database 560 containing black lists of the known malware objects.
In some complex cases, an intervention by an expert analyst is vital for a verdict determination. Thus, the databases 540 and 560 are corrected, and the defense modules 410 can refer to an already updated AV records database 560.
According to the exemplary embodiment, when a collision is detected, the system generates a tree of weight coefficients. An exemplary tree of weight coefficients is depicted in
Then, for example, weight 3 represents URL analysis calculated as a sum of a host name (weight 3.1), a file name (weight 3.2) and a port (weight 3.3). The weight 3 probability based on the host name criteria is calculated from masking name (weight 3.1.1) and whois information (weight 3.1.2). In turn, the masking name probability is calculated on the syntactic similarity of domain name (weight 3.1.1.1) and a similarity of the soundEX algorithms (weight 3.1.1.2).
The whois information is calculated in terms of the name server stability criteria (weight 3.1.2.2) and IP stability (weight 3.1.2.3). The file name (weight 3.2) is obtained as a sum of file name popularity factor (X-factor) (weight 3.2.1) and syntactic file name analysis (weight 3.2.2) and file extension (weight 3.2.3). The file extension is a sum of masking extension (weight 3.2.3.1) and missing extensions (3.2.3.2). In turn, the masking extension results from as a sum of double (triple, etc.) extension (weight 3.2.3.1.1) and extension of non-executable file for executable file (weight 3.2.3.1.2).
The port (weight 3.3) is calculated as a usage probability of non-typical port for a protocol (weight 3.3.1). Thus, the aggregate weight is obtained as a sum of all probabilities calculated for each of the criteria. On such a comprehensive parameter, an accurate verdict regarding the downloaded executable file can be generated.
With reference to
The computer 20 may further include a hard disk drive 27 for reading from and writing to a hard disk (not shown) a magnetic disk drive 28 for reading from or writing to a removable magnetic disk 29, and an optical disk drive 30 for reading from or writing to a removable optical disk 31 such as a CD-ROM, DVD-ROM or other optical media. The hard disk drive 27, magnetic disk drive 28, and optical disk drive 30 are connected to the system bus 23 by a hard disk drive interface 32, a magnetic disk drive interface 33, and an optical drive interface 34, respectively. The drives and their associated computer-readable media provide a non-volatile storage of computer readable instructions, data structures, program modules and other data for the computer 20.
Although the exemplary environment described herein employs a hard disk, a removable magnetic disk 29 and a removable optical disk 31, it should be appreciated by those skilled in the art that other types of computer readable media that can store data accessible by a computer, such as magnetic cassettes, flash memory cards, digital video disks, Bernoulli cartridges, random access memories (RAMs), read-only memories (ROMs) and the like may also be used in the exemplary operating environment.
A number of program modules may be stored on the hard disk, magnetic disk 29, optical disk 31, ROM 24 or RAM 25, including an operating system 35. The computer 20 includes a file system 36 associated with or included within the operating system 35, one or more application programs 37, other program modules 38 and program data 39. A user may enter commands and information into the computer 20 through input devices such as a keyboard 40 and pointing device 42. Other input devices (not shown) may include a microphone, joystick, game pad, satellite dish, scanner or the like.
These and other input devices are often connected to the processing unit 21 through a serial port interface 46 coupled to the system bus, and may be connected by another interfaces, such as a parallel port, game port or universal serial bus (USB). A monitor 47 or other type of display device is also connected to the system bus 23 via an interface, such as a video adapter 48. In addition to the monitor 47, personal computers typically include other peripheral output devices (not shown), such as speakers and printers.
The computer 20 may operate in a networked environment using logical connections to one or more remote computers 49. The remote computer (or computers) 49 may be represented by another computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described relative to the computer 20, although only a memory storage device 50 has been illustrated. The logical connections include a local area network (LAN) 51 and a wide area network (WAN) 52. Such networking environments are commonplace in offices, enterprise-wide computer networks, Intranets and the Internet.
When used in a LAN networking environment, the computer 20 is connected to the local network 51 through a network interface or adapter 53. When used in a WAN networking environment, the computer 20 typically includes a modem 54 or other means for establishing communications over the wide area network 52, such as the Internet. The modem 54, which may be both internal and external, is connected to the system bus 23 via the serial port interface 46. In a networked environment, the program modules depicted relative to the computer 20, or portions thereof, may be stored in the remote memory storage device. It will be appreciated that the network connections shown are exemplary and other means of establishing a communications link between the computers may be used.
Having thus described a preferred embodiment, it should be apparent to those skilled in the art that certain advantages of the described method and apparatus have been achieved. In particular, those skilled in the art would appreciate that the proposed system and method provide for an effective detection and minimization of false positives occurring during anti-malware processing. It should also be appreciated that various modifications, adaptations and alternative embodiments thereof may be made within the scope and spirit of the present invention. The invention is further defined by the following claims.
Number | Name | Date | Kind |
---|---|---|---|
4456922 | Balaban et al. | Jun 1984 | A |
4464675 | Balaban et al. | Aug 1984 | A |
4500912 | Bolger | Feb 1985 | A |
5596712 | Tsuyama et al. | Jan 1997 | A |
6535553 | Limberg et al. | Mar 2003 | B1 |
6831659 | Mukoyama et al. | Dec 2004 | B1 |
6898528 | Zorka et al. | May 2005 | B2 |
7320020 | Chadwick et al. | Jan 2008 | B2 |
7386417 | Bao et al. | Jun 2008 | B1 |
7523168 | Chadwick et al. | Apr 2009 | B2 |
7587761 | Duffield et al. | Sep 2009 | B2 |
7603472 | Petry et al. | Oct 2009 | B2 |
20030033260 | Yashiro et al. | Feb 2003 | A1 |
20040019425 | Zorka et al. | Jan 2004 | A1 |
20040210640 | Chadwick et al. | Oct 2004 | A1 |
20050035979 | Mukoyama et al. | Feb 2005 | A1 |
20050125513 | Sin-Ling Lam et al. | Jun 2005 | A1 |
20050197542 | Bazin et al. | Sep 2005 | A1 |
20050267941 | Addante et al. | Dec 2005 | A1 |
20050283603 | Raman et al. | Dec 2005 | A1 |
20060075504 | Liu | Apr 2006 | A1 |
20060130141 | Kramer et al. | Jun 2006 | A1 |
20060153459 | Zhang et al. | Jul 2006 | A1 |
20060174344 | Costea et al. | Aug 2006 | A1 |
20060195542 | Nandhra | Aug 2006 | A1 |
20060282894 | Duffield et al. | Dec 2006 | A1 |
20070005713 | LeVasseur et al. | Jan 2007 | A1 |
20070005714 | LeVasseur et al. | Jan 2007 | A1 |
20070005716 | LeVasseur et al. | Jan 2007 | A1 |
20070005717 | LeVasseur et al. | Jan 2007 | A1 |
20070074169 | Chess et al. | Mar 2007 | A1 |
20070113101 | LeVasseur et al. | May 2007 | A1 |
20070121596 | Kurapati et al. | May 2007 | A1 |
20070124235 | Chakraborty et al. | May 2007 | A1 |
20070162584 | Kokusho et al. | Jul 2007 | A1 |
20070162975 | Overton et al. | Jul 2007 | A1 |
20070204026 | Berger | Aug 2007 | A1 |
20080040804 | Oliver et al. | Feb 2008 | A1 |
20080059198 | Maislos et al. | Mar 2008 | A1 |
20080133682 | Chadwick et al. | Jun 2008 | A1 |
20080168135 | Redlich et al. | Jul 2008 | A1 |
20080229421 | Hudis et al. | Sep 2008 | A1 |
20080229422 | Hudis et al. | Sep 2008 | A1 |
20080244009 | Rand et al. | Oct 2008 | A1 |
20080259867 | Guo | Oct 2008 | A1 |
20080270549 | Chellapilla et al. | Oct 2008 | A1 |
20090044024 | Oberheide et al. | Feb 2009 | A1 |
20090077664 | Hsu et al. | Mar 2009 | A1 |
20090089869 | Varghese | Apr 2009 | A1 |
20090177913 | Quinn et al. | Jul 2009 | A1 |
20090178144 | Redlich et al. | Jul 2009 | A1 |
20090254572 | Redlich et al. | Oct 2009 | A1 |