The present invention relates generally to computer security, and more particularly but not exclusively to methods and apparatus for detecting malicious mobile application programs.
Mobile computing devices run mobile operating systems, which are designed to be suitable for computers that are constrained in terms of memory and processor speed. An application program for a mobile operating system is commonly referred to as a “mobile app” or simply as an “app.” A mobile app may be created using a software development kit (SDK), which may include development tools, libraries, application programming interfaces, classes, and other components that enable a programmer to create the mobile app. A mobile app created using an SDK includes instructions, such as classes, of the SDK.
The ANDROID operating system is an example of a mobile operating system that runs on mobile computing devices. Mobile apps for the ANDROID operating system come in a file referred to as the ANDROID application package (APK) file. The ANDROID APK file is an archive file that includes a plurality of files, including files of classes needed by the mobile app to execute.
A programmer may create (e.g., repackage or originally create) an ANDROID mobile app using an SDK from the vendor of the ANDROID operating system, which currently is GOOGLE Inc. An ANDROID mobile app may also be created using a third-party (i.e., not from GOOGLE Inc.) SDK. This poses a security risk because some third-party SDKs are malicious. More particularly, a legitimate (i.e., non-malicious, safe) ANDROID mobile app may be repackaged using a malicious SDK, resulting in classes and/or other instructions from the malicious SDK being included in the repackaged ANDROID mobile app.
In one embodiment, software development kit (SDK) class tree structures of malicious SDKs are created, with each node of the SDK class tree structures representing a class of a corresponding malicious SDK. An app class tree structure of a mobile app is also created, with each node of the app class tree structure representing a class of the mobile app. To determine if the mobile app has been created (e.g., repackaged or originally created) using at least one of the malicious SDKs, the app class tree structure is compared against the SDK class tree structures to find an SDK class tree structure that matches the app class tree structure. For confirmation, the similarity of classes of the app class tree structure relative to classes of the SDK class tree structure can be determined.
These and other features of the present invention will be readily apparent to persons of ordinary skill in the art upon reading the entirety of this disclosure, which includes the accompanying drawings and claims.
The use of the same reference label in different drawings indicates the same or like components.
In the present disclosure, numerous specific details are provided, such as examples of apparatus, components, and methods, to provide a thorough understanding of embodiments of the invention. Persons of ordinary skill in the art will recognize, however, that the invention can be practiced without one or more of the specific details. In other instances, well-known details are not shown or described to avoid obscuring aspects of the invention.
Referring now to
The computer system 100 is a particular machine as programmed with software modules 110. The software modules 110 comprise instructions stored non-transitory in the main memory 108 for execution by the processor 101. When the computer system 100 is configured as a malicious mobile app detector, the software modules 110 may comprise instructions for detecting malicious mobile apps that were created using a malicious SDK. The software modules 110 may be loaded from the data storage device 106 to the main memory 108. An article of manufacture may be embodied as computer-readable storage medium including instructions that when executed by the computer system 100 causes the computer system 100 to be operable to perform the functions of the software modules 110.
Malicious mobile apps are typically legitimate mobile apps that have been repackaged to include malicious instructions. Malicious mobile apps may be detected by package name matching, which involves matching the package name of a mobile app against package names of known malicious mobile apps. Because of the large number of mobile apps currently available, package name matching is prone to false alarms (i.e., classifying a legitimate mobile app as malicious) and false negatives (i.e., classifying a malicious mobile app as legitimate).
Malicious mobile apps may also be detected using a program dependence graph (PDG), which as its name implies shows the dependency of a program. The PDG of a mobile app may be compared against malicious PDGs to determine if the mobile app is malicious. However, building and processing PDGs take time and consume a lot of computing resources. This discourages usage of PDGs in production environments that receive a constant stream of mobile apps to be evaluated.
A mobile computing device 220 may be a smartphone, tablet, or other mobile computing device that runs a mobile operating system. In the following embodiments, a mobile computing device 220 may be an ANDROID smartphone or tablet, i.e., a mobile computing device that runs the ANDROID operating system. A mobile computing device 220 may download a mobile app 213 from a mobile app marketplace 221 (see arrow 201). A mobile app marketplace 221 may comprise one or more computers that host ANDROID mobile apps for online purchase. The mobile app marketplace 221 may be an official mobile app marketplace, e.g., GOOGLE PLAY marketplace. The mobile app marketplace 221 may also be a third-party mobile app marketplace.
The mobile app 213 may or may not be malicious. The mobile app 213 may be legitimate and free of malicious code or may be infected with malicious code. More specifically, the mobile app 213 may be a once legitimate mobile app that has been repackaged by a cybercriminal using a malicious SDK, making the mobile app 213 malicious. The risk of the mobile app 213 being malicious is even greater when the mobile app 213 is obtained from a third-party mobile app marketplace over the Internet. To determine whether or not the mobile app 213 is malicious, the mobile app 213 may be received in the detector 210 for evaluation. The detector 210 may receive the mobile app 213 from the mobile app marketplace 221 (see arrow 202), the mobile computing device 220 (see arrow 203), or some other computing device over the Internet and/or other computer network.
The detector 210 may comprise one or more computers that are configured to detect malicious mobile apps. In one embodiment, the detector 210 maintains a plurality of known malicious SDKs 211 and creates a class tree structure 212 for each malicious SDK 211. Accordingly, the detector 210 may create a plurality of SDK class tree structures 212.
The term “class” is used herein in the context of object-oriented programming in general and the ANDROID operating system in particular. As is well-known, a class is an extensible program-code-template for creating objects. In one embodiment, each node of an SDK class tree structure 212 represents a class of the corresponding malicious SDK 211, with each node indicating the features of the class and other information. In one embodiment, the class features indicated in each node of an SDK class tree structure 212 comprise relevant class features, which are class features that cannot be obfuscated, usually because doing so would prevent the mobile app from properly executing. Examples of these relevant class features that cannot be obfuscated include the derived-from class (e.g., base class or super class), framework application programming interfaces (APIs), string tokens, access flag of the class, particular numeric class features (“DirectMethodNum”, “VirtualMethodNum”, “StaticFieldNum”, “InstanceFieldNum”, “AccessFlag”), etc.
In one embodiment, the detector 210 creates a mobile app class tree structure 214 from a mobile app 213 that is being evaluated. An app class tree structure 214 is similar to an SDK class tree structure 212, except that an app class tree structure 214 represents classes of a mobile app 213 whereas an SDK class tree structure 212 represents classes of a malicious SDK 211. The class tree structures 212 and 214 may thus be created in the same manner.
The detector 210 may compare an app class tree structure 214 against the SDK class tree structures 212 to determine if the app class tree structure 214 is likely to be malicious (i.e., likely to be of a malicious mobile app). In one embodiment, the detector 210 performs tree-structure matching to compare the app class tree structure 214 to the SDK class tree structures 212, and deems the app class tree structure 214 to be likely malicious when the app class tree structure 214 matches an SDK class tree structure 212. To confirm that the app class tree structure 214 is malicious, the detector 210 may perform weighted similarity matching of the classes of the app class tree structure 214 against the corresponding classes of the matching SDK class tree structure 212. The detector 210 may classify a mobile app 213 as malicious when the mobile app 213 has an app class tree structure 214 that matches an SDK class tree structure 212 and has classes that are similar to that of the SDK class tree structure 212. In that case, the mobile app 213 is likely to have been originally created or repackaged using the malicious SDK 211 from which the matching SDK class tree structure 212 was created.
The detector 210 may perform a security action upon detecting a malicious mobile app 213. For example, the detector 210 may so inform the mobile app marketplace 221 to initiate removal of the mobile app 213 from the marketplace (see arrow 204). The detector 210 may also so inform the mobile computing device 220 (see arrow 205), or perform other security actions.
In the example of
Referring back to the example of
When the result of the string filtering indicates that the mobile app is suspicious, the detector 210 creates an app class tree structure of the mobile app. To facilitate comparison, the app class tree structure may be created in the same manner as the SDK class tree structures. Just like an SDK class tree structure, a node of the app class tree structure represents a class of the mobile app and indicates relevant class features of the class.
The detector 210 compares the app class tree structure to the SDK class tree structures (arrows 307 and 308) to determine if the app class tree structure matches an SDK class tree structure. The tree structure matching may be performed using only the classes of the tree structures, i.e., not considering the relevant class features. As will be more apparent below, the relevant class features of classes of the app tree structure and of a matching SDK class tree structure may be subsequently compared by weighted similarity matching.
The detector 210 may employ a suitable tree-structure matching algorithm, such as tree edit distance, to compare the app class tree structure against the SDK class tree structures. In one embodiment, an app class tree structure matches an SDK class tree structure if at least a subtree of the app class tree structure matches at least a subtree of the SDK class tree structure. The length of a subtree that qualifies as a match may be varied and optimized.
An app class tree structure that matches an SDK class tree structure is likely to have been created using the malicious SDK from which the SDK class tree structure was created. Therefore, when an app class tree structure matches an SDK class tree structure of a malicious SDK, the app class tree structure (and the mobile app from which the app class tree structure was made) is likely to be malicious. However, to minimize false alarms, the detector 210 may perform weighted similarity matching to determine the similarity of the classes of the app class tree structure relative to the classes of the matching SDK class tree structure based on relevant class features (arrows 309 and 310).
In the example of
If the app class and the SDK class have the same base class or the same super class, the detector 210 determines the similarity of the app class relative to the SDK class. In one embodiment, for performance reasons, the detector 210 determines the similarity of the app class relative to the SDK class in two steps. In a first step, the detector 210 calculates an initial per class similarity score, which does not take string tokens into account. The initial per class similarity score may be calculated as
initial_per_class similarity=feature1*weight1+feature2*weight2+ . . .
where feature1, feature2, etc. indicate a relevant class feature (but not string tokens) found in both the app class and the SDK class and weight1, weight2, etc. are weights that indicate the importance of a relevant class feature in identifying malicious apps. String tokens are not included in the calculation of the initial per class similarity score. An initial per class similarity score that is less than a threshold indicates that the app class is not similar enough to the SDK class. In that case, the app class and the SDK class are not included in the subsequent weighted similarity matching (step 404 to step 402).
Otherwise, when the initial per class similarity score is greater than the threshold, the detector 210 identifies string tokens that are common in both the app class and the SDK class. In one embodiment, the common string tokens are identified by fuzzy matching. A string token may be hashed prior to matching. In one embodiment, the algorithm used to hash a string token is selected based on the length of the string token. For example, a string token may be hashed using TLSH (Trend Micro Locality Sensitive Hash) when the length of the string token is greater than or equal to 256 and using Levenshtein distance when the string token is less than 256. Other hashing algorithms suitable for fuzzy matching may also be employed without detracting from the merits of the present invention.
The per class similarity score of the app class relative to the SDK may be calculated as in the initial per class similarity score except that the common string tokens (identified by fuzzy matching above) are now included in the calculation (step 405). More particularly, the per class similarity score of the app class, which is relative to the SDK class, may be calculated as
per_class similarity=feature1*weight1+feature2*weight2+ . . . feature_1*weight_1+feature_2*weight_2+ . . .
where feature1, feature2, etc. indicate a relevant class feature found in both the app class and the SDK class, weight1, weight2, etc. are weights that indicate the importance of a relevant class feature in identifying malicious apps, feature_1, feature_2, etc. indicate a string token found in both the app class and the SDK class, and weight_1, weight_2, etc. are weights that indicate the importance of the string token in identifying malicious apps. In one embodiment, the detector 210 includes the name and the per class similarity score of the app class in a result list (step 406).
Continuing the example of
overall_similarity=per_class_similarity1+per_class_similarity2+ . . .
where per_class_similarity1 is the per class similarity score of a class represented by a node in the app class tree structure relative to a corresponding class represented by a node in the matching SDK class tree structure, per_class_similarity2 is the per class similarity score of another class represented by another node in the app class tree structure relative to a corresponding another class represented by another node in the matching SDK class tree structure, etc.
The detector 210 may generate an output, which is the result of the evaluation of the mobile app (arrow 311). For example, the detector 210 may output the name of the package of the mobile app, the malicious SDK that may have been used to create the mobile app, and the overall similarity score. Because the overall similarity score is indicative of the similarity of the app class tree structure of the mobile app relative to the matching SDK class tree structure of the malicious SDK, the overall similarity score may be used as a measure of whether or not the mobile app was created using the malicious SDK. The overall similarity score may be compared to a threshold to determine if the mobile app is malicious.
Methods and systems for detecting malicious mobile apps have been disclosed. While specific embodiments of the present invention have been provided, it is to be understood that these embodiments are for illustration purposes and not limiting. For example, although embodiments of the present invention are explained in the context of the ANDROID mobile operating system, embodiments of the present invention may also be applied to other mobile operating systems. Furthermore, many additional embodiments will be apparent to persons of ordinary skill in the art reading this disclosure.
Number | Name | Date | Kind |
---|---|---|---|
1261167 | Russell | Apr 1918 | A |
7058822 | Edery | Jun 2006 | B2 |
7725735 | Fox | May 2010 | B2 |
8418249 | Nucci | Apr 2013 | B1 |
8756432 | Chen | Jun 2014 | B1 |
8806641 | Li | Aug 2014 | B1 |
9092615 | Mao | Jul 2015 | B1 |
9268668 | Lachwani | Feb 2016 | B1 |
9313219 | Zhang et al. | Apr 2016 | B1 |
9672357 | Alme | Jun 2017 | B2 |
9773114 | Park | Sep 2017 | B2 |
20040261021 | Mitta et al. | Dec 2004 | A1 |
20060241933 | Franz | Oct 2006 | A1 |
20120019674 | Ohnishi et al. | Jan 2012 | A1 |
20120072991 | Belani | Mar 2012 | A1 |
20120214416 | Kent et al. | Aug 2012 | A1 |
20130097706 | Titonis | Apr 2013 | A1 |
20130232540 | Saidi | Sep 2013 | A1 |
20130281206 | Lyons et al. | Oct 2013 | A1 |
20130281207 | Lyons et al. | Oct 2013 | A1 |
20130291123 | Rajkumar et al. | Oct 2013 | A1 |
20130307784 | Matsuzawa et al. | Nov 2013 | A1 |
20140006032 | Korn | Jan 2014 | A1 |
20140082729 | Shim | Mar 2014 | A1 |
20140113683 | Hickey | Apr 2014 | A1 |
20140181973 | Lee | Jun 2014 | A1 |
20140245448 | Moon et al. | Aug 2014 | A1 |
20140248929 | Noonan et al. | Sep 2014 | A1 |
20150213365 | Ideses | Jul 2015 | A1 |
20150220514 | Zhang | Aug 2015 | A1 |
20160021174 | De Los Santos Vilchez | Jan 2016 | A1 |
20160044049 | Feb 2016 | A1 | |
20170161498 | Yavo | Jun 2017 | A1 |
20170185785 | Vorona | Jun 2017 | A1 |
Number | Date | Country |
---|---|---|
WO 2016009356 | Jan 2016 | WO |
Entry |
---|
M. Karami, M. Elsabagh, P. Najafiborazjani and A. Stavrou, “Behavioral Analysis of Android Applications Using Automated Instrumentation,” 2013 IEEE Seventh International Conference on Software Security and Reliability Companion, Gaithersburg, MD, 2013, pp. 182-187. |
Sun, Mengtao, and Gang Tan. “Nativeguard: Protecting android applications from third-party native libraries.” Proceedings of the 2014 ACM conference on Security and privacy in wireless & mobile networks. ACM, 2014, pp. 165-176. |
M. Alazab, V. Moonsamy, L. Batten, P. Lantz and R. Tian, “Analysis of malicious and benign android applications,” 2012 32nd International Conference on Distributed Computing Systems Workshops, Macau, 2012, pp. 608-616. |
Soundex—Wikipedia, the free encyclopedia, 4 sheets [retrieved on Aug. 19, 2014], retrieved from the internet: http://en.wikipedia.org/wiki/Soundex. |
Edit distance—Wikipedia, the free encyclopedia, 5 sheets [retrieved on Aug. 19, 2014], retrieved from the internet: http://en.wikipedia.org/wiki/Edit—distance. |