The present invention relates to privacy-preserving content classification, such as, for example, detection of malware.
With the development of software, networking, wireless communications, and enhanced sensing capabilities, mobile devices such as smartphones, wearable devices and portable tablets have been widely used in recent decades. A mobile device has become an open software platform that can run various mobile applications, known as apps, developed by not only mobile device manufactures, but also many third parties. Mobile apps, such as social network applications, mobile payment platforms, multimedia games and system toolkits can be installed and executed individually or in parallel in the mobile device.
However, malware has developed quickly at the same time. Malware is, in general, a malicious program targeting user devices, for example mobile user devices. Mobile malware holds similar purposes to computer malware and intends to launch attacks to a mobile device to induce various threats, such as system resource occupation, user behaviour surveillance, and user privacy intrusion.
According to some aspects, there is provided the subject-matter of the independent claims. Some embodiments are defined in the dependent claims.
According to a first aspect of the present invention, there is provided an apparatus comprising at least one processing core, at least one memory including computer program code, the at least one memory and the computer program code being configured to, with the at least one processing core, cause the apparatus at least to store a malware pattern set and a non-malware pattern set, receive two sets of one-way function output values from a device, check whether a first one of the two sets of one-way function output values is comprised in the malware pattern set, and whether a second one of the two sets of one-way function output values is comprised in the non-malware pattern set, and determine whether the received sets of one-way function output values are more consistent with malware or non-malware based on the checking.
Various embodiments of the first aspect may comprise at least one clause from the following bulleted list. The previous paragraph is referred to in these clauses as clause 1.
According to a second aspect of the present invention, there is provided an apparatus comprising at least one processing core, at least one memory including computer program code, the at least one memory and the computer program code being configured to, with the at least one processing core, cause the apparatus at least to store a first set of one-way functions and a second set of one-way functions, the first set of one-way functions comprising a malware function set and the second set of one-way functions comprising a non-malware function set, compile data characterizing functioning of an application running in the apparatus, apply the first set of one-way functions to the data to obtain a first set of one-way function output values and apply the second set of one-way functions to the data to obtain a second set of one-way function output values, and provide the first set of one-way function output values and the second set of one-way function output values to another party.
Various embodiments of the second aspect may comprise at least one clause from the following bulleted list. The previous paragraph is referred to in these clauses as clause 9:
According to a third aspect of the present invention, there is provided a method comprising storing a malware pattern set and a non-malware pattern set, receiving two sets of one-way function output values from a device, checking whether a first one of the two sets of one-way function output values is comprised in the malware pattern set, and whether a second one of the two sets of one-way function output values is comprised in the non-malware pattern set, and determining whether the received sets of one-way function output values are more consistent with malware or non-malware based on the checking.
Various embodiments of the third aspect may comprise at least one clause corresponding to a clause from the preceding bulleted list laid out in connection with the first aspect.
According to a fourth aspect of the present invention, there is provided a method, comprising storing, in an apparatus, a first set of one-way functions and a second set of one-way functions, the first set of one-way functions comprising a malware function set and the second set of one-way functions comprising a non-malware function set, compiling data characterizing functioning of an application running in the apparatus, applying the first set of one-way functions to the data to obtain a first set of one-way function output values and applying the second set of one-way functions to the data to obtain a second set of one-way function output values, and providing the first set of one-way function output values and the second set of one-way function output values to another party.
Various embodiments of the fourth aspect may comprise at least one clause corresponding to a clause from the preceding bulleted list laid out in connection with the second aspect.
According to a fifth aspect of the present invention, there is provided an apparatus comprising means for storing a malware pattern set and a non-malware pattern set, means for receiving two sets of one-way function output values from a device, means for checking whether a first one of the two sets of one-way function output values is comprised in the malware pattern set, and whether a second one of the two sets of one-way function output values is comprised in the non-malware pattern set, and means for determining whether the received sets of one-way function output values are more consistent with malware or non-malware based on the checking.
According to a sixth aspect of the present invention, there is provided an apparatus comprising means for storing a first set of one-way functions and a second set of one-way functions, the first set of one-way functions comprising a malware function set and the second set of one-way functions comprising a non-malware function set, means for compiling data characterizing functioning of an application running in the apparatus, means for applying the first set of one-way functions to the data to obtain a first set of one-way function output values and applying the second set of one-way functions to the data to obtain a second set of one-way function output values, and means for providing the first set of one-way function output values and the second set of one-way function output values to another party.
According to a seventh aspect of the present invention, there is provided a non-transitory computer readable medium having stored thereon a set of computer readable instructions that, when executed by at least one processor, cause an apparatus to at least store a malware pattern set and a non-malware pattern set, receive two sets of one-way function output values from a device, check whether a first one of the two sets of one-way function output values is comprised in the malware pattern set, and whether a second one of the two sets of one-way function output values is comprised in the non-malware pattern set, and determine whether the received sets of one-way function output values are more consistent with malware or non-malware based on the checking.
According to an eighth aspect of the present invention, there is provided a non-transitory computer readable medium having stored thereon a set of computer readable instructions that, when executed by at least one processor, cause an apparatus to at least store a first set of one-way functions and a second set of one-way functions, the first set of one-way functions comprising a malware function set and the second set of one-way functions comprising a non-malware function set, compile data characterizing functioning of an application running in the apparatus, apply the first set of one-way functions to the data to obtain a first set of one-way function output values and applying the second set of one-way functions to the data to obtain a second set of one-way function output values, and provide the first set of one-way function output values and the second set of one-way function output values to another party.
Privacy of a user may be protected in server-based malware detection by using plural hash functions, to obtain hash values of behavioural patterns of applications. The hash values may be provided to a server, which may check if the hash values match existing hash value patterns associated with malware behaviour. Since only hashes are provided to the server, the server does not gain knowledge of what the user has been doing. The server may obtain the hash value patterns associated with malware behaviour from a central trusted entity, which may comprise an antivirus software company, operating system vendor or governmental authority, for example.
The mobiles are in wireless communication with base station 120, via wireless links 111. Wireless links 111 may comprise uplinks for conveying data from the mobile toward the base station direction, and downlinks for conveying information from the base station toward the mobiles. Communication over wireless links may take place using a suitable wireless communication technology, such as a cellular or non-cellular technology. Examples of cellular technologies include long term evolution, LTE, and global system for mobile communication, GSM. Examples of non-cellular technologies include wireless local area network, WLAN, and worldwide interoperability for microwave access, WiMAX, for example. In case of non-cellular technologies, base station 120 might be referred to as an access point, however the expression base station is used herein for the sake of simplicity and consistency.
Base station 120 is in communication with network node 130, which may comprise, for example, a base station controller or a core network node. Network node 130 may be interfaced, directly or indirectly, to network 140 and, via network 140, to server 150. Server 150 may comprise a cloud server or computing server in a server farm, for example. Server 150 is, in turn, interfaced with central trusted entity 160. Server 150 may be configured to perform offloaded malware detection concerning applications running in mobiles 110 and 115. Central trusted entity 160 may comprise an authorized party, AP, which may provide malware-associated indications to server 150.
Although discussed herein in a mobile context, the disclosure extends also to embodiments where devices 110 and 115 are interfaced with server 150 via wire-line communication links. In such cases, the devices may be considered, generally, user devices.
Devices such as mobiles 110 and 120 may be infected with malware. Attackers may intrude a mobile device via air interfaces, for example. Mobile malware could make use of mobile devices to send premium SMS messages to incur costs to the user and/or to subscribe to paid mobile services without informing the user. In recent years, mobile devices enhanced with sensing and networking capabilities have been faced with novel threats, which may seek super privileges to manipulate user information, for example by obtaining access to accelerometers and gyroscopes, and/or leaking user private information to remote parties. Nowadays, malware can rely on camouflage techniques to produce metamorphoses and heteromorphic versions of itself, to evade detection by anti-malware programs. Malware also uses other evasion techniques to circumvent regular detection. Some malware can broadcast itself using social networks based on social engineering attacks, by making use of the curiosity and credulity of mobile users. With smart wearable devices and other devices emerging, there will be more security threats targeting mobile devices.
In general, malware may be detected using static and dynamic methods. The static method aims to find malicious characteristics or suspicious code segments without executing applications, while the dynamic approach focuses on collecting an application's behavioural information and behavioural characteristics during its runtime. Static methods cannot be used to detect new malware, which has not been identified to the device in advance. On the other hand, dynamic methods may consume a lot of system resources. While offloading dynamic malware detection to another computational substrate, such as a server, such as a cloud computing server, saves computational resources in the device itself, it discloses information concerning applications running in the device to the computational substrate which performs the computation, which forms a privacy threat.
One way to detect malware in a hybrid and generic way, especially for mobile malware in Android devices, comprises collecting execution data of a set of known malware and non-malware applications. Thus it is possible to generate, for the known malware and non-malware applications, patterns of individual system calls and/or sequential system calls with different calling depth that are related to file and network access, for example. By comparing the patterns of the individual and/or sequential system calls of malware and non-malware applications with each other, a malicious pattern set and a normal pattern may be constructed that may be used for malware and non-malware detection.
Applied to classifying an unknown application, a dynamic method may be used to collect its runtime system calling data in terms of individual calls and/or sequential system calls, such as, for example sequential system calls with different depth. Frequencies of system calls may also be included in such data which characterizes the functioning of an application. The calls may involve file and/or network access, for example. Target patterns, such as the system call patterns, of the unknown application may be extracted from its runtime system calling data. By comparing them with both the malicious pattern set and the normal pattern set, the unknown application may be classified as malware or non-malware based on its dynamic behavioural pattern. At least some embodiments of the present invention rely on such logic to classify applications.
The malicious pattern set and the normal pattern set can be further optimized and extended based on patterns of newly confirmed malware and non-malware applications. Since data collected for malware detection contains sensitive information about mobile usage behaviors and user activities, it may intrude user privacy to share it with a third party.
To enable comparing behaviour of an unknown application with the malicious pattern set and the normal pattern set, hash functions may be employed. In detail, data characterizing functioning of the application may be collected, for example using a standardized manner to gather, for example, the system call data described above. Once the data has been collected, two sets of hash functions may be applied to the data. A set of hash functions may comprise, for example, hash functions of a same hash function family but with differing parameters, such that different hash functions of the set each produce different hash output values with a same input. The data characterizing functioning of the application thus characterizes the behaviour of the application when it is run, and not the static software code of the application as stored.
A first set of hash functions may be associated with malware, and/or a second set of hash functions may be associated with non-malware. Consequently, running the first set of hash functions with the data produces a first set of hash output values and/or running the second set of hash functions on the data produces a second set of hash output values. The first set of hash output values may be associated with malware and the second set of hash output values may be associated with non-malware. These are, respectively, a malware pattern and a non-malware pattern. The malware-associated hash functions may be associated with malware merely due to being used with malware, in other words, the hash functions themselves do not have malware aspects.
A server may store sets of hash output values which are associated with malware and/or with non-malware. The hash output values associated with malware, known as a malware pattern set, may have been obtained from observing behaviour of known malware, by hashing data which characterizes the functioning of the known malware with the set of hash functions associated with malware. The hash output values associated with non-malware, that is, a non-malware pattern set, may likewise be obtained using known non-malware. Thus where a device sends its hash output value sets obtained from the data to such a server, the server may compare the hash output values received from the device to the hash output values it has, to determine if the behaviour of the application in the device matches with known malware and/or non-malware. In other words, the server may determine whether the hash output values received from the device are a malware pattern or a non-malware pattern.
Acting thus using hashes, the technical effect and benefit is obtained wherein behaviour-based malware detection may be performed offloaded partly into a server, such as a cloud server, such that the server does not gain knowledge of what the user does with his device. In other words, the solution provides behaviour-based malware detection which respects user privacy. An authorized party, AP, may collect data characterizing the functioning of a set of known malware and non-malware to generate the malware pattern set and the non-malware pattern set used for malware detection. Where Bloom filters are used, their use saves memory in the server owing to recent advances in implementing Bloom filters.
One approach to malware detection in a privacy-preserving way uses Bloom filters, which may optionally use counting. For each malware pattern, the AP may use a malware Bloom filter, MBF, for a set of malware-associated hash functions Hm to calculate its hash output values and send them to a third party, such as a server. The server may insert these hash output values into the right positions of Bloom filter MBF with counting and correspondingly, optionally, save a weight of the pattern into a table named MalWeight. The malware Bloom filter MBF may thus be constructed using the malware hash output values, the weights of which may further be recorded in MalWeight. Similarly, for a non-malware application pattern, AP may use another Bloom filter for non-malware apps, NBF, with hash functions Hn to calculate hash output values and sends them to the server. The server may insert these hash output values into the right positions of Bloom filter NBF, and correspondingly save the weight of the patterns into a table named NorWeight. In this way, the server may insert all non-malware hash value output patterns into NBF to finish the construction of NBF and, optionally, record their weights in NorWeight.
When detecting an unknown application in a user device, its data characterizing its runtime behaviour may be collected, such as system calling data including individual calls and/or sequential system calls with different depth. Then the user device may use hash function sets Hm and Hn on the collected runtime data to calculate the corresponding hash output values and send them to the server for checking if the hash output value patterns match the patterns inside MBF and NBF. Based on the hash output value matching, corresponding weights may be added together in terms of non-malware patterns and malware patterns, respectively. Based on the summed weights and predefined thresholds, the server can judge if the tested app is malware or a non-malware app.
When new malware and/or non-malware apps are collected by the AP, the AP may make use them to regenerate malware pattern sets and non-malware pattern sets. If there are new patterns to be added into MBF and/or NBF, the AP may send their hash output value sets to the server, which may insert them into the MBF and/or the NBF by increasing corresponding counts in the Bloom filter, for example, and at the same time updating MalWeight and/or NorWeight. If there are some patterns' weights which need to be updated, the AP may send their hash output values to the server, which may check their positions in MBF and/or NBF and update MalWeight and/or NorWeight accordingly.
If there are some patterns needed to be removed from MBF or NBF, the AP may send their hash output values to the server, which removes them from MBF and/or NBF by deducting corresponding counts in the Bloom filter and at the same time updating MalWeight and/or NorWeight. In case that any one Bloom filter's length is not sufficient for the purpose of malware detection due to the increase of pattern number, a new Bloom filter may be re-constructed with new filter parameters and hash function sets.
In the server, SRV, malware hash value patterns are received into Bloom filter MBF in phase 260 and non-malware hash value patterns are received into Bloom filter NBF in phase 270. MBF weights are generated/adjusted in phase 280, and NBF weights are generated/adjusted in phase 290. In phase 2100 hash value patterns from a user device are compared to hash value patterns received in the server from AP, to determine whether the hash value patterns received from the user device more resemble malware or non-malware patterns received from the AP, weighted by the corresponding weights. A decision phase 2110 is invoked when a threshold is crossed in terms of detection reliability. The threshold may relate to operation of the Bloom filters as well as to the weights.
In the device, phase 2140 comprises executing applications, optionally in a virtual machine instance, and collecting the data which characterizes the functioning of the applications. In phases 2120 and 2130, respectively, the malware hash function set and the non-malware hash function set are used to obtain a malware hash value pattern and a non-malware hash value pattern. These are provided to the server SRV for comparison in phase 2100.
A separate feedback is provided from the user device, which may comprise a mobile device such as mobile 110 in
A security model is now described. Driven by personal profits and considering individual reputation, each type of party involved does not collude with other parties. It is assumed that the communications among the parties is secure by applying appropriate security protocols. The AP and the server cannot be fully trusted. They may operate according to designed protocols and algorithms, but they may be curious concerning device user privacy or other parties' data. Mobile device users worry about individual usage information or other personal information disclosure to AP and/or the server. In the disclosed method, the device pre-processes locally collected application execution data and it extracts application behavioral patterns. Through hashing the extracted data patterns with the hashes used by Bloom filters, it hides real plain information of extracted patterns when sending them to the server for malware detection.
When AP generates two pattern sets by collecting known malware and normal apps, devices may merely send app installation packages to it, thus no any device user information is necessarily disclosed to the AP. During malware detection and pattern generation, the server cannot obtain any device user information since it can only gets hash output values, it cannot know and plain behavioral data or the app names either.
In the proposed method, a great deal of data searching and matching needs to be done. A Bloom filter, BF, is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970. It is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not—in other words, a query returns either “possibly in set” or “definitely not in set”. Elements can be added to the set, element removing is also possible, which can be addressed with a “counting” filter. The more elements that are added to the set, the larger the probability of false positives.
Suppose set K has n elements K={k1, k2, . . . , kn}, K is mapped into a bit array V with length l to store through a number of h hash functions H={h1, h2, . . . , hh}, where hi (i=1, . . . , h) are independent with each other. For generating a Bloom filter, we need to decide H and l, and the decision depends on the size of K, i.e., n. BF construction is the process of inserting the elements in K, which contains the following steps:
Step 1: BF initialization by setting all bits in V as 0;
Step 2: For any ki (i=1, . . . , n), get a number of h hash codes h1(ki), h2(ki), hh(ki) in order to decide the positions where ki is mapped into V. We mark the corresponding positions as BF[h1(ki)], BF[h2(ki)], . . . , BF[hh(ki)].
Step 3: Make the value of V in the mapped positions BF[h1(ki)], BF[h2(ki)], . . . , BF[hh(ki)] as 1. Thus, V represents a Bloom filter of set K.
For querying if element x is inside K, one direct method is to compare x with each element in K in order to get the result, the accuracy of query is 100%. Another method is to use a Bloom filter, BF. First, we calculate the h hash codes of x, decides x's mapped positions in V, i.e., BF[h1(x)], BF[h2(x)], BF[hh(x)]. Then we check if all above mapped positions' values are 1. If there is one bit is 0, it means x is definitely not inside K. If all above mapped positions' values are 1, this means x could be inside K. Although BF based search or query could cause false positive, it can bring advantages regarding storage space and search time. This is very useful and beneficial for big data process. For reducing false positive, a suitable BF may be designed by selecting proper system parameters. With this way, we can reduce error detection to minimum and increase detection accuracy as high as possible.
The original Bloom filter can only support inserting new elements into the filter vector and searching. A countable Bloom filter supports reversibly searching and deleting elements from the vector. Due to the advanced features of Bloom filter in terms of storage space saving and fast search in the context of big data, it can be widely used in many fields. However, the Bloom filter that can support digital number operations should be further studied in order to satisfy the demands of new applications.
Algorithm 1: Countable BF Generation.
When detecting an unknown app a, a dynamic method is used to collect its runtime patterns Pa (Pa={pa,1, pa,2, . . . , pa,na}) (e.g., system calling data including both individual calls and sequential system calls with different depth). Then the mobile device uses Hire and Hn to calculate their hash codes Hm(pa)={Hm(pa,1), Hm(pa,2), . . . , Hm(pa,na)} and Hn(pa)={Hn(pa,1), Hn(pa,2), . . . , Hn(pa,na)} and sends them to the server for checking if some patterns match the patterns inside MBF and NBF.
The server searches Hm(Pa) in the MBF. If all positions' values of Hm(pa,i) in MBF is more than 0, sum the weight of this pattern saved in MalWeight. The server searches the hashes of all patterns in Hm(Pa) and get MWa. In addition, the server searches Hn(Pa) in NBF. If all positions' values of Hn(pa,i) in NBF is more than 0, sum the weight of this pattern saved in NorWeight. The server searches the hashes of all patterns in Hn(Pa) and get NWa. Refer to Algorithm 2 about countable BF search. Next, the server compares MWa and NWa with Tm and Tn to decide if app a is normal or malicious.
Algorithm 2: Countable BF Search
Algorithm 3: Countable BF Update
Algorithm 4: Countable BF Delete
A processor may comprise circuitry, or be constituted as circuitry or circuitries, the circuitry or circuitries being configured to perform phases of methods in accordance with embodiments described herein. As used in this application, the term “circuitry” may refer to one or more or all of the following: (a) hardware-only circuit implementations, such as implementations in only analog and/or digital circuitry, and (b) combinations of hardware circuits and software, such as, as applicable: (i) a combination of analog and/or digital hardware circuit(s) with software/firmware and (ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation.
This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.
Device 300 may comprise memory 320. Memory 320 may comprise random-access memory and/or permanent memory. Memory 320 may comprise at least one RAM chip. Memory 320 may comprise solid-state, magnetic, optical and/or holographic memory, for example. Memory 320 may be at least in part accessible to processor 310. Memory 320 may be at least in part comprised in processor 310. Memory 320 may be means for storing information. Memory 320 may comprise computer instructions that processor 310 is configured to execute. When computer instructions configured to cause processor 310 to perform certain actions are stored in memory 320, and device 300 overall is configured to run under the direction of processor 310 using computer instructions from memory 320, processor 310 and/or its at least one processing core may be considered to be configured to perform said certain actions. Memory 320 may be at least in part comprised in processor 310. Memory 320 may be at least in part external to device 300 but accessible to device 300.
Device 300 may comprise a transmitter 330. Device 300 may comprise a receiver 340. Transmitter 330 and receiver 340 may be configured to transmit and receive, respectively, information in accordance with at least one cellular or non-cellular standard. Transmitter 330 may comprise more than one transmitter. Receiver 340 may comprise more than one receiver. Transmitter 330 and/or receiver 340 may be configured to operate in accordance with global system for mobile communication, GSM, wideband code division multiple access, WCDMA, 5G, long term evolution, LTE, IS-95, wireless local area network, WLAN, Ethernet and/or worldwide interoperability for microwave access, WiMAX, standards, for example.
Device 300 may comprise a near-field communication, NFC, transceiver 350. NFC transceiver 350 may support at least one NFC technology, such as NFC, Bluetooth, Wibree or similar technologies.
Device 300 may comprise user interface, UI, 360. UI 360 may comprise at least one of a display, a keyboard, a touchscreen, a vibrator arranged to signal to a user by causing device 300 to vibrate, a speaker and a microphone. A user may be able to operate device 300 via UI 360, for example to configure malware detection functions.
Device 300 may comprise or be arranged to accept a user identity module 370. User identity module 370 may comprise, for example, a subscriber identity module, SIM, card installable in device 300. A user identity module 370 may comprise information identifying a subscription of a user of device 300. A user identity module 370 may comprise cryptographic information usable to verify the identity of a user of device 300 and/or to facilitate encryption of communicated information and billing of the user of device 300 for communication effected via device 300.
Processor 310 may be furnished with a transmitter arranged to output information from processor 310, via electrical leads internal to device 300, to other devices comprised in device 300. Such a transmitter may comprise a serial bus transmitter arranged to, for example, output information via at least one electrical lead to memory 320 for storage therein. Alternatively to a serial bus, the transmitter may comprise a parallel bus transmitter. Likewise processor 310 may comprise a receiver arranged to receive information in processor 310, via electrical leads internal to device 300, from other devices comprised in device 300. Such a receiver may comprise a serial bus receiver arranged to, for example, receive information via at least one electrical lead from receiver 340 for processing in processor 310. Alternatively to a serial bus, the receiver may comprise a parallel bus receiver.
Device 300 may comprise further devices not illustrated in
Processor 310, memory 320, transmitter 330, receiver 340, NFC transceiver 350, UI 360 and/or user identity module 370 may be interconnected by electrical leads internal to device 300 in a multitude of different ways. For example, each of the aforementioned devices may be separately connected to a master bus internal to device 300, to allow for the devices to exchange information. However, as the skilled person will appreciate, this is only one example and depending on the embodiment various ways of interconnecting at least two of the aforementioned devices may be selected without departing from the scope of the present invention.
In phase 410, the AP needs to add or revise a specific hash output value pattern in the malware pattern set in the server. A pattern update request is sent to the server. The server responds by sharing the hash value set Hm to the AP in phase 420. Further, if necessary, the server re-initializes the malware Bloom filter MBF and sets up the MalWeight table. In phase 430, the AP provides to the server the hash output value pattern Hm(x) with weight MWx. The server inserts the hash output value pattern Hm(x) into filter MBF and updates the MalWeight table. If Hm(x) is already in MBF, the server may update its weight in MalWeight. Such updating may be an increase of the weight by MWx.
Phases 440-460 illustrate a similar process for non-malware. Phase 440 comprises a pattern update request concerning the non-malware patterns. Phase 450 comprises the server providing the hash function set Hn to the AP, and phase 460 comprises the AP providing the hash output value set Hn(y) to the server, along with weight NWy. In phase 460, the server inserts the hash output value pattern Hn(y) into filter NBF and updates the NorWeight table. If Hn(y) is already in NBF, the server may update its weight in NorWeight. Such updating may be an increase of the weight by NWy.
Phase 510 comprises storing a malware pattern set and a non-malware pattern set. The pattern sets may comprise one-way function output values of behavioural data of malware and non-malware applications, respectively. Phase 520 comprises receiving two sets of one-way function output values from a device. Phase 530 comprises checking whether a first one of the two sets of one-way function output values is comprised in the malware pattern set, and whether a second one of the two sets of one-way function output values is comprised in the non-malware pattern set. Finally, phase 540 comprises determining whether the received sets of one-way function output values are more consistent with malware or non-malware based on the checking.
Phase 610 comprises storing, in an apparatus, a first set of one-way functions and a second set of one-way functions, the first set of one-way functions comprising a malware function set and the second set of one-way functions comprising a non-malware function set. Phase 620 comprises compiling data characterizing functioning of an application running in the apparatus. Phase 630 comprises applying the first set of one-way functions to the data to obtain a first set of one-way function output values and applying the second set of one-way functions to the data to obtain a second set of one-way function output values. Finally, phase 640 comprises providing the first set of one-way function output values and the second set of one-way function output values to a server.
It is to be understood that the embodiments of the invention disclosed are not limited to the particular structures, process steps, or materials disclosed herein, but are extended to equivalents thereof as would be recognized by those ordinarily skilled in the relevant arts. It should also be understood that terminology employed herein is used for the purpose of describing particular embodiments only and is not intended to be limiting.
Reference throughout this specification to one embodiment or an embodiment means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Where reference is made to a numerical value using a term such as, for example, about or substantially, the exact numerical value is also disclosed.
As used herein, a plurality of items, structural elements, compositional elements, and/or materials may be presented in a common list for convenience. However, these lists should be construed as though each member of the list is individually identified as a separate and unique member. Thus, no individual member of such list should be construed as a de facto equivalent of any other member of the same list solely based on their presentation in a common group without indications to the contrary. In addition, various embodiments and example of the present invention may be referred to herein along with alternatives for the various components thereof. It is understood that such embodiments, examples, and alternatives are not to be construed as de facto equivalents of one another, but are to be considered as separate and autonomous representations of the present invention.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the preceding description, numerous specific details are provided, such as examples of lengths, widths, shapes, etc., to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention can be practiced without one or more of the specific details, or with other methods, components, materials, etc. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of the invention.
While the forgoing examples are illustrative of the principles of the present invention in one or more particular applications, it will be apparent to those of ordinary skill in the art that numerous modifications in form, usage and details of implementation can be made without the exercise of inventive faculty, and without departing from the principles and concepts of the invention. Accordingly, it is not intended that the invention be limited, except as by the claims set forth below.
The verbs “to comprise” and “to include” are used in this document as open limitations that neither exclude nor require the existence of also un-recited features. The features recited in depending claims are mutually freely combinable unless otherwise explicitly stated. Furthermore, it is to be understood that the use of “a” or “an”, that is, a singular form, throughout this document does not exclude a plurality.
At least some embodiments of the present invention find industrial application in malware detection and privacy protection.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2018/091666 | 6/15/2018 | WO | 00 |