An increased number of transactions of all types (e.g., financial, social, educational, religious, entertainment, etc.) are taking place in the virtual environment of the Internet rather than in the real world environment. The parties participating in a virtual environment transaction are more likely to be separated by some unknown distance and are more likely not to have the opportunity to visually see each other during the transaction compared to parties participating in a real world environment transaction.
As a consequence, the virtual environment transactions have been compromised by fraudulent practices that lead to property, identity, and personal information theft and lead to abuse and bodily injury. Some examples of these fraudulent practices include phishing, spy ware, and predatory behavior. Phishing refers to the acquisition of personal information (e.g., usernames, passwords, social security number, credit card information, bank account details, etc.) from a person in an illegitimate manner (e.g., through e-mails, instant messages, and/or websites from impersonated parties) for criminal purposes. Spy ware refers to damaging, privacy infiltrating, threatening, or malicious software. Usually, spy ware invades a person's computer resources without the person's knowledge. Predatory behavior refers to activity of persons or businesses intending to defraud, harm, or harass others by taking advantage of the anonymous nature of virtual environment transactions.
Given the problems of virtual environment transactions, several solutions have been crafted to deal with these problems. Although these solutions have had various degrees of success in mitigating the fraudulent practices, the losses attributable to the fraudulent practices continue to rise due to the tremendous growth in the number of virtual environment transactions.
Deficiencies in measures implemented to deal with the phishing types of malware are illustrative of shortcomings of actions taken to address other fraudulent practices plaguing virtual environment transactions. Typically, a heuristic methodology is utilized in anti-phishing tools. To determine whether a website being accessed is a phishing website, the heuristic methodology examines various characteristics and attributes of the website to classify the website as either a non-phishing website or a phishing website to which access is blocked. Due to accuracy limitations of the heuristic methodology, the false positive rate (or rate that a website is classified as a phishing website when the website is actually a non-phishing website) may be higher than desired. This frustrates visitors to the incorrectly classified website and causes the owners of the incorrectly classified website to raise legal issues. Frustrated visitors may be inclined to turn-off the anti-phishing tool, increasing their vulnerability to phishing. In a greater portion of the visited websites than desired, the heuristic methodology may not be able to actually classify websites as non-phishing or phishing, prompting a caution message to the visitor alerting to the possibility of phishing. The caution message may appear so frequently that it may simply be ignored instead of being seriously considered.
Moreover, the heuristic methodology is susceptible to reverse engineering by individuals intending to continue phishing activity undetectable by the heuristic methodology. This influences the false negative rate (or rate that a website is classified as a non-phishing website when the website is actually a phishing website). Furthermore, the heuristic methodology is typically applied only to visited websites. Non-visited websites are not subjected to the heuristic methodology to classify the websites as either non-phishing websites or phishing websites to which access is blocked, limiting the scope of protection against phishing.
These identified deficiencies also hinder obtaining useful feedback from visitors to websites and actually discourage visitors from providing feedback that may help correct or improve anti-phishing tools.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Embodiments of the claimed subject matter, among other things, involve the soliciting of user feedback concerning the reputation of objects to implement a feedback augmented object reputation service. It is desired to obtain user feedback based on the object's reputation rather than heuristics. A particular object may be one of a number of different types of objects. URLs (Uniform Resource Locators), software, persons, and businesses are examples of types of objects. Various data sources are used to determine the reputation. The reputations of the objects are made available upon request, such as via a reputation service. Web clients may request object reputations from the reputation service. Those objects having a reputation that is not sufficient to label them “safe” can, with adequate certainty, trigger a feedback solicitation process, for example, implemented through the functionality of the user's Web browser (e.g., solicitation dialogue, etc.). The solicitation process solicits specific user feedback concerning the object, and involves the user indicating whether the object is either a dangerous object (e.g., phishing, spy ware, etc.) or a safe object. The feedback is used to update a knowledge base describing the object reputation. In response to any subsequent requests, the updated object reputation is returned.
Thus, embodiments provide a targeted manner of soliciting feedback from the user community to categorize an object's reputation and increase the accuracy of reputation characterizations returned for subsequent queries. The targeted manner of soliciting feedback increases participation by the user community. Moreover, the targeted manner of soliciting feedback is well suited to deal with various undesirable practices such as phishing, spy ware, and predatory behavior.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.
Reference will now be made in detail to embodiments of the claimed subject matter, examples of which are illustrated in the accompanying drawings. While the embodiments will be described, it will be understood that the descriptions are not intended to limit the claimed subject matter to these embodiments. On the contrary, the claimed subject matter is intended to cover alternatives, modifications and equivalents, which may be included within the spirit and scope as defined by the appended claims. Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the embodiments. However, it will be recognized by one of ordinary skill in the art that the embodiments may be practiced without these specific details. In other instances, well-known methods, procedures, components, and circuits have not been described in detail as not to unnecessarily obscure aspects of the embodiments.
Some portions of the detailed descriptions are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “tagging” or “characterizing” or “filtering” or the like, refer to the action and processes of a computer system (e.g., computer system 500 of
The system 100 embodiment implements a feedback augmented object reputation service. The reputation service is provided to the user 110 by the reputation provider 130. The user 110 accesses the Web sites 120, and through the course of such access typically encounters a number of software-based objects. The authenticity and/or the safety of these objects can be checked by interaction between the user 110 and the reputation provider 130.
In a typical usage scenario, the Web client 112 of the user 110 transmits reputation queries regarding one or more of the objects encountered on one or more of the web sites 120. The reputation provider 130 returns a object reputation corresponding to the query. This object reputation includes attributes that describe the authenticity, safety, reliability, or other such characteristics related to the object. In general, the object reputation describes a degree to which a given object can be characterized as being a dangerous object or a safe object. For example, the object reputation output can inform the user whether a particular link or URL provided by one of the web sites 120 is true (e.g., a phishing site, etc.) or false (e.g., the site is in fact authentic). This information can be visually provided to the user via GUI elements of the interface of the Web client 112.
The reputation provider 130 stores reputation information for the large number of objects that can be encountered by the user 110. For example, the reputation provider 130 can include a knowledgebase of the objects hosted by the web sites 120 and have a corresponding reputation value stored for each of these objects. The reputation generation service 140 functions by generating per object reputation and providing that reputation to the reputation provider 130. The reputation generation service 140 can utilize a number of different techniques to derive reputation attributes regarding a given object. Such techniques include, for example, machine learning algorithms, contextual filtering algorithms, object historical data, and the like.
The reputation feedback service 150 functions by receiving user feedback (e.g., from the user 110) and associating that feedback with the corresponding object. An objective of the feedback service is to obtain per object user feedback regarding attributes descriptive of the object (e.g., authenticity, safety, reliability, or other such characteristics) and transmit this information to the reputation generation service 140. This enables the reputation generation service 140 to update the reputation value for the objects in consideration of the received user feedback. In general, the solicitation of feedback from the user is triggered when the value for a given object reputation indicates the object is a potentially a dangerous object. The solicitation of feedback from the user is not triggered when the object reputation indicates the object is a safe object. The updating in consideration of the received feedback increases the accuracy and reliability of the per object reputation generated and transmitted to the reputation provider 130. In one embodiment, a feedback module 114 is included and is specifically configured to interface with the user and obtain the per object user feedback. The feedback module 114 then transmits the per object user feedback to the reputation feedback service 150.
In this manner, the feedback enabled updating of the reputation knowledgebase yields a number of advantages. For example, one advantage is the fact that the feedback enabled updating reduces the chances of a runaway increase in the number of false positives produced (e.g., safe objects that are incorrectly classified as dangerous objects). The feedback mechanism will quickly identify those objects which may be mistakenly labeled as dangerous objects (e.g., malware false-positive), while simultaneously increasing the heuristic true-positive rate.
Another advantage is the fact that the feedback enabled updating utilizes a community to provide judgment on objects. The community can use information to derive an initial reputation from any source or update existing reputation (e.g., personal knowledge, etc.), as opposed to merely the static client code, or object code, or the like. Another advantage is the fact that community feedback enabled updating alleviates the dependency on one or more centralized grading staffs (e.g., at a mail client vendor, webmail provider, etc.) to assess and correctly decide medium confidence reputation scenarios. The community feedback mechanism can leverage the broad user base to improve the experience for the community as a whole.
Process 200 begins at step 201, where an initial reputation is generated for a plurality of objects hosted by the plurality of web sites 120. As described above, the reputation generation service 140 utilizes a number of different techniques to derive reputation attributes regarding a given object (e.g., machine learning algorithms, object historical data, etc.). At step 202, the generated initial reputations are transmitted to the reputation provider 130. The initial reputations are used to populate the reputation knowledgebase and provide a level of service upon which subsequent arriving reputation feedback can improve.
At step 203, the reputation provider 130 receives reputation queries from the user 110. As described above, as each user requests reputation information regarding one or more objects, the reputation provider will return a reputation output for that object. At first, the object reputation output will be based upon the initial reputation information generated at step 201. At step 204, as the object reputation output has been transmitted to the user, the feedback module 114 can solicit user feedback regarding the particular object in question. In general, the solicitation of feedback from the user is triggered when the value for the object reputation indicates the object is potentially a dangerous object, and the solicitation of feedback from the user is not triggered when the value for the object reputation indicates the object is a safe object. As described above, the user feedback can include a number of different attributes descriptive of the object (e.g., authenticity, safety, reliability, or other such characteristics). The user's feedback can be conclusive with regard to whether they think the object is a positive (e.g., malware, phishing site, etc.) or a negative tag (e.g., authentic, safe, etc.). Conjointly or alternatively, the determination can be biased toward safety for those objects where the reputation is unclear. For example, those objects having a reputation that is not sufficient to label them “safe” can be treated such that they will trigger the feedback solicitation process.
At step 205, the user provided feedback is associated with the corresponding object by the reputation feedback service 150. At step 206, the reputation generation service updates its reputation generation mechanisms in consideration of the user provided feedback. Then in step 207, the updated reputation for the object is transmitted to the reputation provider 130, which in turn updates its reputation knowledgebase. In this manner, the accuracy and usability of the reputation knowledgebase is quickly improved in consideration of the feedback obtained from actual users.
It should be noted that additional information (e.g., in addition to the yes/no response) is included in the feedback received from the user, this information is used in the reputation generation process. Such additional information includes, for example, metadata describing the object in question, information identifying the user, and the like. Additionally, the historical performance of the particular user providing the feedback can be taken into consideration. For example, those users with a strong history of accurately identifying dangerous objects can be given a stronger weighting. Similarly, those users with a history of inaccurate object feedback (e.g., high false positive rate) can be given a reduced weighting. In some cases, such additional information may be more powerful in the reputation generation process than the yes/no feedback response.
In the
An exemplary usage scenario is now described. In this scenario, it is assumed that a URL (e.g., foo.com, etc.) has arrived and populates one or more of the source data components 320. The source data components 320 comprise modules that interface with different service provider agents (e.g., e-mail providers, e-mail clients, external heuristic engines, and the like) and can identify objects of interest. The filtering algorithms of the filtering component 310 receives the object and yields an inconclusive reputation rating for the URL, but it is assumed that the component 310 is inclined to tag the URL toward the dangerous end of the spectrum. An appropriate reputation message (e.g., “Is this phish?”) is then propagated to the reputation provider 130 regarding the URL. At this point, a user (e.g., user 110 of
Referring still to
The reputation propagation component 330 can include functionality that implements the management of filter output from the filter component 310 (e.g., block ratings, “Is this phish” rating, Junk, etc.). The reputation propagation component 330 can also include functionality for false positive mitigation, rollup and inheritance management, and specific time-to-live settings for “is this phish” ratings (e.g., expires after 36 hrs, etc.).
The reputation validation component 340 can include functionality that validates whether or not objects that are labeled as dangerous actually are dangerous. The validation component 340 can also include functionality for false positive mitigation.
In its most basic configuration, computer system 500 typically includes processing unit 503 and memory 501. Depending on the exact configuration and type of computer system 500 that is used, memory 501 can be volatile (e.g., such as DRAM, etc.) 501a, non-volatile 501b (e.g., such as ROM, flash memory, etc.) or some combination of the two.
Additionally, computer system 500 can include mass storage systems (e.g., removable 505 and/or non-removable 507) such as magnetic or optical disks or tape. Similarly, computer system 500 can include input devices 509 and/or output devices 511 (e.g., such as a display). Computer system 500 can further include network connections 513 to other devices, computers, networks, servers, etc. using either wired or wireless media. As all of these devices are well known in the art, they need not be discussed in detail.
The
The foregoing descriptions of the embodiments have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the claimed subject matter to the precise forms disclosed, and many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and practical applications of the embodiments, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as are suited to the particular use contemplated. It is intended that the scope of the claimed subject matter be defined by the claims appended hereto and their equivalents.