This application claims the priority benefit of Taiwan application serial no. 101150253, filed on Dec. 26, 2012. The entirety of the above-mentioned patent application is hereby incorporated by reference herein and made a part of this specification.
1. Field of the Invention
The invention relates to a method for detecting an application and particularly relates to a method and a system for detecting a malicious application installed on a mobile electronic device.
2. Description of Related Art
As the development of smartphones and tablets becomes popular, our life and these mobile electronic devices become closely connected. The popularity of smartphones and tablets pushes forward the development of the application industry.
Taking applications developed for the Android platform as an example, reverse engineering techniques for Android applications have matured in recent years, and some Android malicious applications have been repackaged and distributed into third-party application markets. For this reason, users may unwittingly download applications containing malicious codes, which cause personal information to be stolen. Most of the conventional malicious application detecting methods rely on known malicious codes or behaviors to perform detection and thus cannot successfully detect new variant malicious applications. Moreover, repackaged malicious applications look very similar to the benign applications, and the added malicious components mostly run in the background and therefore cannot be detected easily. In view of the above, it is necessary to develop a mechanism for effective detection and warning of malicious applications.
Accordingly, the invention provides a method and a system for detecting a malicious application for quickly and effectively examining whether an application adapted for a mobile electronic device is malicious.
The invention provides a malicious application detecting method, including: collecting a plurality of training malicious applications (APK files) and a plurality of training benign applications (APK files); respectively obtaining a manifest file and a de-compiled code from each of training malicious applications and each of training benign applications, and extracting static features from each manifest file and each de-compiled code; generating at least one malicious application group based on training malicious applications using a clustering algorithm, and grouping training benign applications into at least one benign application group according to a classification rule designed by the application market, such as games, music, business, weather, shopping and so on; generating application detecting models that respectively represent the malicious and benign application groups according to static features of training malicious applications in each malicious application group and training benign applications in each benign application group; when a target application is received, obtaining a target manifest file and a target de-compiled code from the target application and extracting static features from the target manifest file and the target de-compiled code; using a classification algorithm, the target static features, and the malicious and benign application detecting models to determine whether the target application belongs to any of the malicious application groups; and generating a warning message if a determination result is positive.
From another aspect, the invention provides a malicious application detecting system, including a feature extracting unit, a clustering unit, and a determining unit. The feature extracting unit is configured for receiving a plurality of training malicious applications (APK files) and a plurality of training benign applications (APK files), respectively obtaining a manifest file and a de-compiled code from each of training malicious applications and each of training benign applications, and extracting static features from each manifest file and each de-compiled code. The clustering unit is coupled to the feature extracting unit for generating at least one malicious application group based on training malicious applications using a clustering algorithm and grouping at least one benign application group based on training benign applications by referring to a classification rule designed by the application market, such as games, music, business, weather, shopping and so on. Application detecting models that respectively represent the malicious and benign application groups are generated according to static features of training malicious applications in each malicious application group and training benign applications in each benign application group. The determining unit is coupled to the feature extracting unit and the clustering unit for controlling the feature extracting unit to obtain a target manifest file and a target de-compiled code from a target application when the target application is received and extracting target static features from the target manifest file and the target de-compiled code. The determining unit uses a classification algorithm, the target static features, and the malicious and benign application detecting models to determine whether the target application belongs to any of the malicious application groups, and generates a warning message when the target application belongs to one of the malicious application groups.
Based on the above, the invention utilizes various static features contained in the manifest file and the de-compiled code of the application to establish the malicious and benign application groups, so as to analyze the manifest file and the de-compiled code in the application of the target application and use the static features thereof to determine whether the target application is malicious. Therefore, the detection result is generated quickly and accurately without the source code of the target application.
To make the aforementioned and other features and advantages of the invention more comprehensible, several embodiments accompanied with figures are described in detail below.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate exemplary embodiments of the invention and, together with the description, serve to explain the principles of the invention.
The malicious application detecting system 100 determines whether an application contains any virus or malicious code mainly through static analysis. In particular, the malicious application detecting system 100 effectively detects the security of applications adapted for mobile electronic devices, so as to protect the mobile electronic devices. More specifically, the mobile electronic devices may include smartphones, personal digital assistants, or tablets, etc., and the applications are for example adapted for Android platform; however, the scope of the invention is not limited thereto.
In this embodiment, an operation of the malicious application detecting system 100 mainly includes two stages. Referring to
It is worth mentioning that the feature extracting unit 110 of this embodiment extracts static features of a training application from a manifest file and a de-compiled code obtained from each of the training applications. According to static features, the clustering unit 120 generates the application detecting models for analyzing the applications. In other words, the malicious application detecting system 100 of this embodiment mainly utilizes the information provided by the manifest files and the de-compiled codes of the training applications to generate the malicious and benign application detecting models that are to be used in the examination stage.
In another embodiment, the malicious application detecting system 100 further includes a network unit (not shown). Accordingly, a user at a terminal device (e.g. a smartphone) may connect to the malicious application detecting system 100 through a network to examine specific applications.
The aforementioned units may be implemented in the form of hardware, software, or a combination of hardware and software. For example, the hardware may be a central processing unit (CPU), a programmable microprocessor for general use or special use, a digital signal processor (DSP), a programmable controller, an application specific integrated circuit (ASIC), any device capable of operation and processing, or a combination of the foregoing. The software may include an operation system, an application, or a driver.
Detailed operation of each unit of the malicious application detecting system 100 is described below in another embodiment.
In Step S310, the malicious application detecting system 100 collects a plurality of training applications (APK files). The training applications include several kinds of malicious applications (i.e. training malicious APK files) and several kinds of benign applications (i.e. training benign APK files).
Next, as shown in Step S320, the feature extracting unit 110 receives and reverse-engineers the collected training malicious applications and training benign applications, so as to obtain the manifest file and the de-compiled code respectively from each of the training malicious and benign applications and extract static features of applications corresponding to the training malicious and benign applications from the manifest files and the de-compiled codes. Specifically, the static features at least includes one of a Permission, a Component and a component type, an Intent, and an application interface (API) call, or a combination of the foregoing. The component type may be an activity, a service, a receiver, a provider, etc., for example.
In Step S330, the clustering unit 120 generates at least one malicious application group based on all training malicious applications using a clustering algorithm and groups at least one benign application group based on all training benign applications by referring to a classification rule designed by the application market, such as games, music, business, weather, shopping and so on. Further, in Step S340, the clustering unit 120 generates application detecting models that respectively represent the malicious and benign application groups according to static features of training malicious applications in each malicious application group and training benign applications in each benign application group. To be more specific, the clustering unit 120 presents all static features extracted by the feature extracting unit 110 in the form of vectors and utilizes the clustering algorithm to generate several malicious application groups respectively having similar static features. Moreover, the clustering unit 120 generates several benign application groups respectively having similar static features according to the classification rule designed by the application market, such as games, music, business, weather, shopping and so on. The malicious and benign application groups respectively correspond to specific application detecting models (i.e. malicious application detecting model and benign application detecting model, in brief). It should be noted that the clustering unit 120 may select an appropriate clustering algorithm according to the properties of the collected training applications.
In the following paragraphs, the operation of the clustering unit 120 is explained with reference to
First, as shown in Step S410, the weight determining unit 121 evaluates a weight of each of static features to training malicious applications. For example, for each training malicious application, the weight determining unit 121 gathers statistics about the number of times that each static feature appears in each training malicious application. For each static feature, the weight determining unit 121 gathers statistics about the number of training malicious applications that contain this static feature. In addition, the weight determining unit 121 utilizes a term frequency-inverse document frequency (TF-IDF) formula to calculate the weight of each static feature to each training malicious application. That is to say, the weight reflects the importance of each static feature.
Then, in Step S420, the group number evaluating unit 123 presents the static features of each training malicious application in the form of vector and generates a number of cluster groups. More specifically, the group number evaluating unit 123 calculates a plurality of eigenvalues according to a singular value decomposition (SVD) formula and obtains first N eigenvalues of the eigenvalues that cover a specific percentage of a spectral energy, and regards N as the number of cluster groups. Herein, the group number evaluating unit 123 calculates the eigenvalues and the spectral energies they covers from large to small, and obtains the first N eigenvalues that cover the total spectral energy for use with priority. It should be noted that N is a positive integer; however, according to the invention, N is not necessarily a fixed constant. N is determined by a value of the specific percentage. For instance, the specific percentage is 95%, but the scope of the invention is not limited thereto.
As shown in Step S430, the model generating unit 125 generates at least one malicious application group by applying the clustering algorithm with the weight of the static features of each training malicious application and the vector form. All training malicious applications that belong to the same malicious application group have similar static features. For training benign applications of the benign application group, the model generating unit 125 groups training benign applications into at least one benign application group according to the classification rule of the application market, such as games, music, business, weather, shopping and so on.
Step S310 to Step S340 of
More specifically, referring to Step S350 of
Thereafter, in Step S370, the determining unit 130 uses a classification algorithm, the target static features extracted by the feature extracting unit 110, and the malicious and benign application detecting models generated by the clustering unit 120 to determine whether the target application belongs to one of the malicious application groups.
If the target application does not belong to any of the malicious application groups, the determining unit 130 determines that the application corresponding to the target application is a benign application, as shown in Step S380.
On the contrary, if the target application belongs to one of the malicious application groups, the determining unit 130 determines that the application corresponding to the target application is a malicious application and generates a warning message, as shown in Step S390.
As illustrated in
In conclusion of the above, the malicious application detecting method and system of the invention utilize static features, e.g. Permission, Component and component type, Intent, and API call, provided by the manifest file and the de-compiled code of the application, to generate the models for examination. Accordingly, when examining the security of the application, the analysis is accomplished simply based on the compiled application without the source code of the application. Additionally, the examination procedure performed based on static analysis does not occupy much system resources and thus the analysis result is generated more efficiently and more accurately.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the invention covers modifications and variations of this disclosure provided that they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
101150253 | Dec 2012 | TW | national |