The present invention relates to techniques for evaluating credential data that is compromised by malicious software.
Malicious software (often referred to as “malware”) is used by an attacker to gather sensitive information, disrupt computer operation and/or gain access to private computer systems. Malware includes computer viruses, Trojan horses, worms and other malicious programs. Malware affects many industries, including finance, healthcare, government, insurance, telecommunications and education. Malware silently captures a wide variety of data and credentials in malware log files from online users, including critical business information. Accordingly, organizations must establish policies and controls to protect enterprise information from malware.
A number of techniques have been proposed or suggested for the collection, monitoring and/or notification of compromised credentials for enterprises. Existing services primarily deal with collecting compromised credential records and presenting them for review by enterprise users. Given the high volume of compromised credential data, however, enterprises are often overwhelmed by the amount of compromised credential information that must be processed and cannot perform a detailed investigation of such records. In addition, the actions that need to be performed in response to such compromises may not be clear.
A need therefore exists for proactive techniques for evaluating compromised credential information based on machine learning and pattern recognition methods to meet the needs of enterprises and/or other users. A further need exists for techniques for identifying the most valuable records (e.g., most vulnerable accounts) for further detailed investigation.
Embodiments of the present invention provide improved techniques for evaluating compromised credential information. In one embodiment, a method for evaluating compromised credentials comprises the steps of: collecting data regarding previously compromised credentials that were used to commit an unauthorized activity; applying one or more statistical learning methods to the collected data to identify one or more patterns; and evaluating a risk of credentials that have been compromised by one or more attackers using the identified patterns. According to a further aspect of the invention, a risk score is generated for one or more users and devices. The risk scores are optionally ordered based on an order of risk.
In one exemplary embodiment, the data is collected from one or more of anti-fraud servers and information sources. The collected data comprises, for example, one or more of attributes of malware used by an attacker to obtain the previously compromised credentials; attributes of a drop where an attacker stored the previously compromised credentials; attributes of a device from where the previously compromised credentials were obtained; attributes of the unauthorized activity.
Advantageously, illustrative embodiments of the invention provide techniques for evaluating compromised credential records based on machine learning and pattern recognition methods. These and other features and advantages of the present invention will become more readily apparent from the accompanying drawings and the following detailed description.
The present invention provides techniques for evaluating compromised credential records based on machine learning and pattern recognition methods. According to one aspect of the invention, credential risk assessment techniques are provided for compromised/stolen personal and corporate credential data. In this manner, the most valuable records (e.g., most vulnerable accounts) can be identified for further detailed investigation. In one exemplary implementation, compromised credential records are organized in decreasing order of risk in order to present the most compromised credentials first (e.g., those credentials with high probability to have malicious activity performed on their behalf). In this manner, the risk ranking increases the efficiency and directs the actions required by organizations and/or users upon detection of compromised credentials.
According to a further aspect of the invention, a statistical approach is employed using machine learning methodology to perform compromised credential risk evaluation based on a number of criteria (dimensions). As discussed hereinafter, by correlating various data elements known on each compromised record, records can be identified that are likely to be used for fraudulent activities, such as financial exploitation or theft of medical records.
In one exemplary embodiment, a ranked list of compromised credentials is generated that provides users with an improved ability to act upon the stolen credentials. For example, if a bank receives a ranked list of compromised accounts; the bank can apply the following exemplary policy:
Very high risk compromised accounts—deny automatic money movement;
High risk compromised accounts—add additional authentication to money movements;
Medium risk compromised accounts—limit sum of money that can be moved;
Low risk compromised accounts—apply monitoring to accounts; and
Very low risk compromised accounts—ignore the fact that credentials were stolen.
The exemplary processing device 102-1 comprises a processor 110 coupled to a memory 112. The processor 110 may comprise a microprocessor, a microcontroller, an ASIC, an FPGA or other type of processing circuitry, as well as portions or combinations of such circuitry elements, and the memory 112, which may be viewed as an example of a “computer program product” having executable computer program code embodied therein, may comprise RAM, ROM or other types of memory, in any combination.
Also included in the processing device 102-1 is network interface circuitry 114, which is used to interface the processing device with the network 104 and other system components, and may comprise conventional transceivers.
The other processing devices 102-2 through 102-K are assumed to be configured in a manner similar to that shown for processing device 102-1 in
As shown in
According to one aspect of the present invention, a credentials ranking engine 200 is provided, as discussed further below in conjunction with
Generally, the identity protection and verification server 250-1 can be embodied, for example, using RSA Identity Verification™, from RSA Security Inc. of Bedford, Mass., U.S.A. The identity protection and verification server 250-1 comprises authentication and fraud prevention services that validate user identities and reduce the risk associated with identity impersonation. The exemplary identity protection and verification server 250-1 confirms a user's identity in real time using dynamic knowledge-based authentication (KBA).
The anti fraud command center 250-2 can be embodied, for example, using RSA Anti-Fraud Command Center (AFCC) TM, from RSA Security Inc. of Bedford, Mass., U.S.A. The exemplary anti fraud command center 250-2 addresses online fraud threats such as phishing, pharming and Trojan attacks on behalf of customers.
The eFraud network 250-3 can be embodied, for example, using RSA eFraudNetwork from RSA Security Inc. of Bedford, Mass., U.S.A. The exemplary eFraud network 250-3 is a data repository of fraud profiles gleaned from RSA's worldwide network of customers, end users, and Internet service providers (ISPs) as well as from the RSA Antifraud Command Center 250-2 and third-party contributors. Generally, when an online fraud pattern or other cybercriminal activity is identified, the associated data, activity profile, and device fingerprints are moved to a shared data repository in the eFraud network 250-3 from which active network members receive updates on a regular basis. These ongoing updates enable real-time, proactive protection to online users.
In addition, as shown in
As shown in
As shown in
Victim ID—the unique identifier that identified the current victim;
Victim IP Address;
Victim HTTP header fields;
Victim device parameters;
Source of information (trigger)—an indication of where the data was stolen from;
Credentials—the data that was actually stolen/compromised;
Date and Time—the time when the information 310 was recorded.
The exemplary information 320 from the identity protection and verification server 250-1 comprises a user identifier, a user IP address, a device elements list, fraud feedback on the “usage” of stolen credentials, and a date/time stamp. The exemplary information 330 from partners comprises the reputation of an IP address (e.g., the fraud history); drop point server details; and fraud feedback on the “usage” of stolen credentials.
As shown in
triggers—the type of information that should be stolen;
Malware type—the malware type used;
update point—server(s) to retrieve information from, e.g., configuration updates.
When the collected data 380 is analyzed over time and across multiple datasets, new dimensions of insight are available. For example, by correlating all of the information about one victim with all of the information from all the other infected victims and previously seen fraud patterns, the following can be determined:
“Old Drop zones” (i.e., Drop zone age higher than a predefined threshold) with many compromised users and machines may consider being more risky because it can be assumed that the malicious actor behind the attack is a relatively sophisticated fraudster;
“Old compromised records” (i.e., record age higher than a predefined threshold) suggest a low probability that the fraudster will use old compromised records so the score can be gradually reduced for these records;
entities' drop zone targets (e.g., average number of new compromised credentials added per day to the drop zone and the average live time in the Internet Service Provider (ISP) hosting this malware) permit a prediction of how many users will be exposed to such fraud);
user velocities on infected machines can distinguish between personal and shared devices (this knowledge can be used to assign higher score to shared devices because there are potentially many users that are exposed to the malware); and
tracking the actual fraud case activity for specific drop zones can lead to possible fraud scenarios for specific credentials.
where C indicates whether a compromise occurred (Yes/No) and A indicates the Attribute.
The features extraction stage 500 thus generates a set of attribute probabilities 520, indicating the probability for each attribute that fraudulent activity will occur.
The exemplary machine learning techniques are based on learning patterns of compromised credentials where fraud was committed with them (the collected feedback on the “usage” of stolen credentials enables learning the risk), as well as the frequency and velocities of retrieved credentials (the volume of compromised credentials given a specific drop zone, specific target and malware has a correlation to the fraud probability and this will be taken into account in the risk modeling).
As shown in
where C indicates a compromise (Yes/No) and A indicates the Attribute.
The risk scoring stage 600 thus generates a risk score 270 during step 610 for each user/machine and optionally organizes the information in decreasing order of risk.
Numerous other arrangements of servers, computers, storage devices or other components are possible. Such components can communicate with other elements over any type of network, such as a wide area network (WAN), a local area network (LAN), a satellite network, a telephone or cable network, or various portions or combinations of these and other types of networks.
It should again be emphasized that the above-described embodiments of the invention are presented for purposes of illustration only. Many variations may be made in the particular arrangements shown. For example, although described in the context of particular system and device configurations, the techniques are applicable to a wide variety of other types of information processing systems, data storage systems, processing devices and distributed virtual infrastructure arrangements. In addition, any simplifying assumptions made above in the course of describing the illustrative embodiments should also be viewed as exemplary rather than as requirements or limitations of the invention. Numerous other alternative embodiments within the scope of the appended claims will be readily apparent to those skilled in the art.
Number | Name | Date | Kind |
---|---|---|---|
8015604 | Tidwell et al. | Sep 2011 | B1 |
8056130 | Njemanze et al. | Nov 2011 | B1 |
8141157 | Farley et al. | Mar 2012 | B2 |
8230505 | Ahrens et al. | Jul 2012 | B1 |
8528091 | Bowen et al. | Sep 2013 | B2 |
8769684 | Stolfo et al. | Jul 2014 | B2 |
8819825 | Keromytis et al. | Aug 2014 | B2 |
20030154396 | Godwin et al. | Aug 2003 | A1 |
20050116025 | Davis | Jun 2005 | A1 |
20060265746 | Farley et al. | Nov 2006 | A1 |
20060282660 | Varghese et al. | Dec 2006 | A1 |
20070073630 | Greene et al. | Mar 2007 | A1 |
20080016569 | Hammer et al. | Jan 2008 | A1 |
20080140576 | Lewis et al. | Jun 2008 | A1 |
20090222369 | Zoldi et al. | Sep 2009 | A1 |
20090242629 | Davis | Oct 2009 | A1 |
20090300589 | Watters et al. | Dec 2009 | A1 |
20100057622 | Faith et al. | Mar 2010 | A1 |
20100169192 | Zoldi et al. | Jul 2010 | A1 |
20110214187 | Wittenstein et al. | Sep 2011 | A1 |
20110225142 | McDonald | Sep 2011 | A1 |
20110270752 | Neto et al. | Nov 2011 | A1 |
20110276468 | Lewis et al. | Nov 2011 | A1 |
20120233698 | Watters et al. | Sep 2012 | A1 |
20120239557 | Weinflash et al. | Sep 2012 | A1 |
20120255022 | Ocepek et al. | Oct 2012 | A1 |
20120290712 | Walter et al. | Nov 2012 | A1 |
20120297484 | Srivastava | Nov 2012 | A1 |
20120324551 | Bretschneider et al. | Dec 2012 | A1 |
20130133072 | Kraitsman et al. | May 2013 | A1 |
20140007238 | Magee et al. | Jan 2014 | A1 |
20140012724 | O'leary et al. | Jan 2014 | A1 |
Entry |
---|
RSA Press Release, New RSA FraudAction Anti-Trojan Service to Provide Institutions with Comprehensive Crimeware Protection (Mar. 15, 2007) (available at www.rsa.com/press—release). |
Thorsten Holz et al., Learning More About the Underground Economy: A Case-Study of Keyloggers and Dropzones, in Proceedings of the 14th European Conference on Research in Computer Security 1-18 (2009). |
Martin Apel et al., Towards Early Warning Systems—Challenges, Technologies and Architecture, in Proceedings of the 4th International Conference on Critical Information Infrastructures Security 151-164 (2009). |
Marco Cova et al., An Analysis of Rogue AV Campaigns, in Proceedings of the 13th International Conference on Recent Advances in Intrusion Detection 442-463 (2010). |