This application relates in general to computer security, and more particularly though not exclusively to a system and method for providing micro-clustering of objects.
Modern computing ecosystems often include “always on” broadband internet connections. These connections leave computing devices exposed to the internet, and the devices may be vulnerable to attack.
The present disclosure is best understood from the following detailed description when read with the accompanying FIGURES. It is emphasized that, in accordance with the standard practice in the industry, various features are not necessarily drawn to scale, and are used for illustration purposes only. Where a scale is shown, explicitly or implicitly, it provides only one illustrative example. In other embodiments, the dimensions of the various features may be arbitrarily increased or reduced for clarity of discussion. Furthermore, the various block diagrams illustrated herein disclose only one illustrative arrangement of logical elements. Those elements may be rearranged in different configurations, and elements shown in one block may, in appropriate circumstances, be moved to a different block or configuration.
A computer-implemented system and method of clustering a universe of featurized objects into micro-clusters includes selecting a vantage point having a feature vector; computing, for the featurized objects in the universe, respective distances from the vantage point, and sorting the featurized objects into a sorted container based on their distances from the vantage point; clustering adjacent objects into a plurality of micro-clusters based on determining that objects have a distance from a next adjacent object less than a maximum distance; and storing the micro-clusters onto a tangible computer-readable medium to modify operation of a computing apparatus based on objects in the micro-clusters.
The following disclosure provides many different embodiments, or examples, for implementing different features of the present disclosure. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. Further, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. Different embodiments may have different advantages, and no particular advantage is necessarily required of any embodiment.
Clustering is a machine learning (ML) technique useful for identifying similarities between sample objects. In general terms, a clustering system may start by building a feature vector for each object, wherein various attributes of the object are quantified. The feature vector thus provides an array of numerical values (commonly real, floating-point values), with each value representing an attribute of the object. The system may then compute the “distance” between two objects by calculating a scalar distance between their feature vectors. Two objects that have a short distance are similar, while two objects that have a long distance are dissimilar. This may provide a more flexible comparison of objects than, for example, hashing, which is best at detecting objects that are identical to one another.
Once objects are featurized, the clustering system may compute the distances between objects, for example using a location-sensitive hashing (LSH) algorithm such as MinHash or TLSH. By computing a scalar distance between objects, the system can determine which objects are most similar to one another. Those objects can then be “clustered” based on their proximity to one another. In some cases, clustering is based on selecting all samples within a minimum distance (MIN_DISTANCE, sometimes also ε) of a sample, and then performing the same operation on the selected samples. This can form large data structures of clusters, with hundreds or thousands of objects in a cluster, for example. One limitation of clustering is that, given such a data structure, while objects may have much in common with their immediate neighbors, objects on one extreme end of the cluster may have less in common with objects on another extreme end of the cluster. This can make it difficult to craft a signature that is broad enough to capture all points in the cluster, but narrow enough to avoid large numbers of false positives.
Clustering has many applications in many different branches of the data sciences. In this specification, detection of malware and other cybersecurity operations is used as an illustrative and nonlimiting example of the clustering method taught herein. However, the method is also applicable to other clustering applications, and those applications are not intended to be excluded from the present specification and the appended claims.
Sample clustering has multiple applications in cybersecurity and other industries. One common use case is clustering samples for the purpose of discovering a common pattern. This common pattern can then be used to achieve different objectives. One of these objectives in cybersecurity is to create signatures that can be used to detect all samples within a cluster. For example, a cluster can be include several variants of ransomware in a family, even though those variants are not identical and would not be detected by a single hash. With a good cluster, researches can identify common patterns among samples and author a signature that can capture the variants in the cluster, as well as similar variants that have not yet been encountered.
One issue with existing clustering systems is that they often form very large clusters of objects, containing a large variety of related samples. Yet samples on opposite edges of such a large cluster may be only tangentially related to one another, and it can be difficult to craft a signature that captures all those samples, while avoiding false positives. These large clusters may be considered “impure,” in the sense that they contain an large group of only vaguely related samples. Existing algorithms can also require constant fine tuning of the parameters (such as how many clusters to derive), and may struggle to scale with big data. These factors create a challenging environment for security researchers trying to derive signatures from these clusters.
The present specification provides a system and method that will ordinarily yield smaller “micro-clusters,” usually smaller than those that are found in existing clustering algorithms. These micro-clusters may maintain higher purity and precision, and may require significantly less parameter tuning. This provides an effective method for deriving signatures to proactively detect samples from the cluster and beyond.
The micro-clustering begins with an arbitrary vantage point called “Moon,” which is expected to be far from all samples in the sample universe (e.g., just as every person on the earth is relatively far from the center of mass of the moon). Samples are sorted into a “sorted container,” based on their distance from the Moon vantage. Samples are then compared only to their adjacent neighbors, and the samples cluster together so long as they do not exceed a distance threshold. Once an adjacent sample exceeds the distance threshold, the micro-cluster is closed out, and other samples in the set may or may not form more micro-clusters.
On a next pass, the Moon vantage is not used (because it would yield the same results). Instead a new vantage is selected, and a new sorted container is built, based on distance from the new vantage. In an illustrative case, the new vantage point is not a distant feature vector like Moon, but rather is selected from the set of samples that did not cluster on the first pass. In a specific example, the median remaining sample may be selected. Micro-clustering passes may continue in this manner, until a pass results in no new clusters, or until a threshold of MAX_PASSES. After all passes are complete, any samples that remain are considered unclustered samples.
While operating on sorted containers, the system may use a “laser cutting” strategy that obtains micro-clusters based on adjacent distance measurements, and cuts off a cluster once an adjacent sample is found above a distance threshold. Micro-clustered samples are then removed from the sorted container, and the process repeats with a new vantage point.
The system and method disclosed herein may realize advantages over existing clustering algorithms. For example, some algorithms have poor precision, such as K-Means. Because of this, authoring signatures from unstable clusters yields poor coverage, or may not even be possible where no pattern can be extrapolated from a group of vaguely related samples.
Other algorithms are more precise, but tend to form very large clusters. Creating a signature to capture all samples in such a cluster can yield a signature so broad that it also captures many false positives. The model can be fine-tuned to find “just right” sizes, but this is time consuming, highly data dependent, and may be undesirable for maintenance purposes.
Algorithms like DBSCAN may struggle to form reliable clusters when there is no obvious drop in density between clusters. In other words, if the input data have many clusters that may overlap (for example, malware data), DBSCAN may group multiple groups into a single cluster (thus, obtaining a single big blob cluster), which reduces the reliability of the solution for signature authoring.
Existing clustering algorithms may also be slow, and may face some scalability issues when dealing with big data. For example, a DBSCAN on over 3 million samples can require up to 128 GB of RAM to compute. This can be problematic in modern anti-malware systems that have collected malware samples over the course of decades, and where the number of samples may be in the billions.
Advantageously, the method disclosed herein is highly memory efficient, and may not require significant parameter tuning/maintenance. Furthermore, it yields micro-clusters are small enough to provide high precision and purity and reduce vague relationships. This enables better signature authoring.
The system and method of the present specification work well where the universe of samples can be represented by a Locality Sensitive Hashing (LSH) scheme that supports a distance metric between two arbitrary samples, in compliance with the triangle inequality. In other words, the LSH scheme should be able to measure the distance between any two samples, regardless of how distant they are. One example of a known LSH scheme that supports this is TLSH (the “T” has no specific meaning), but in general, any comparison algorithm that can calculate a distance between two samples is suitable. Within this specification, a value that can compare two samples, regardless of distance, is referred to as an “LSH-compliant” value.
The method disclosed can be iterated until the produced micro-clusters are satisfactory for the use case. The specific number of times to run the method may depend on the input data. It may be beneficial to define a MAX_PASSES limit, to help the system finish clustering within the desired time. For example, setting MAX_PASSES to 15 would run a maximum of 15 passes, and then stop clustering (possibly leaving some unclustered samples behind). Empirical evaluation has found that for many data sets, after 6 passes, the return of investment diminishes significantly. Thus, even though additional passes will still yield extra micro-clusters, these may be less relevant than the ones obtained during the initial passes.
In an illustrative example, an initial vantage point is selected. A vantage point is defined as any arbitrary (real or virtual) LSH-compliant value that may be used as a reference to measure the distance of the universe samples against. For example, a vantage point may be selected with the intent that it is not expected to be very close to any real samples. This may include, for example, creating a fake feature vector with all characters being the same character, such as hexadecimal ‘F,’ the last hexadecimal character. Any other value could be used, such as ‘7’ (the median hexadecimal character), or ‘0’ or ‘1’ (low characters). The hash could also be generated randomly, or based on an alternating pattern (e.g., “017F017F . . . ”). Any of these are statistically unlikely to be close to actual samples. Thus, example “Moon” vantage hashes may include:
Or any other selected value.
Using the defined vantage point, the system creates a sorted container, including the full universe of samples. A sorted container may be a simple data structure like a list or a dictionary, which is natively sorted using a distance/comparison criterion. In this case, the system uses the LSH-compliant value distance, measured between each sample and the vantage point. Thus, when this sorted container is created, samples closer to the vantage point are at the beginning of the container, while samples distant from the vantage point will be placed towards the end of the container.
The system then iterates through the sorted container, computing the distance between the current sample (N) and the previous sample (N−1). If this distance does not exceed the MAX_DIST (which can be derived empirically depending on the input data), then include the sample N into a temporary micro-cluster. If this is a new micro-cluster, the system also adds sample (N) to the micro-cluster, as there is no previous sample. Subsequently, the system only adds sample (N) if its distance with (N−1) is satisfactory. This identifies “distance valleys,” which are used to form micro-clusters. This effect is depicted in
While iterating through a sorted container, if the distance between (N) and (N−1) exceeds MAX_DIST, then the temporary micro-cluster is closed. If the temporary micro-cluster has more than MIN_SAMPLES (which can be defined by the user), then the formed micro-cluster is saved and all the samples belonging to it are marked for removal from the sorted container. If the temporary micro-cluster does not have enough samples, then the temporary micro-cluster is discarded and reset. The iteration of the sorted container continues until the end of the container.
Once the sorted container has been fully iterated, samples marked for removal (e.g., because they were sorted into micro-clusters) are removed from the sorted container.
After removing clustered samples from the sorted container, the system identifies a new sample to use as the vantage point for the next pass. This may be, for example, the sample in the “middle” or median of the remaining samples in the sorted container. This may be obtained, for example, by dividing the length of the remaining container by 2, and selecting the sample at that index as the new vantage point.
The system then iterates again, sorting the remaining samples based on their LSH-compliant distance from the new vantage point, and then clustering is repeated.
Once the system has executed MAX_PASSES, or an execution pass is unable to form any new micro-clusters, the formed micro-clusters are collected as the output of the method, and the method terminates.
In this manner, the resulting micro-clusters are “opportunistically” discovered and formed. This is a consequence of sorting the universe against a vantage point. This sorting ensures that very similar samples are near one another, and can be discovered by the laser cutting algorithm that forms micro-clusters.
One rationale behind micro-clustering is that the samples within a micro-cluster are similar enough for discovering patterns (e.g., strings or functionality that the samples have in common) so that a signature can be authored with confidence. A larger and less dense cluster may introduce the risk of not being able to find good commonality between the samples of the cluster.
The foregoing can be used to build or embody several example implementations, according to the teachings of the present specification. Some example implementations are included here as nonlimiting illustrations of these teachings.
There is disclosed in one example, one or more tangible, nontransitory computer-readable storage media having stored thereon executable instructions to clustering a universe of featurized objects into micro-clusters, the instructions to: receive a selected vantage point having a feature vector; compute, for the featurized objects in the universe, respective distances from the selected vantage point, and sort the featurized objects into a sorted container based on their distances from the selected vantage point; cluster adjacent objects into a plurality of micro-clusters based on determining that objects have a distance from a next adjacent object less than a maximum distance; and store the micro-clusters onto a tangible computer-readable medium to modify operation of a computing apparatus based on objects in the micro-clusters.
There is disclosed another example, wherein computing respective distances comprises using a locality-sensitive hashing (LSH) algorithm.
There is disclosed another example, wherein the LSH algorithm is TLSH.
There is disclosed another example, wherein the instructions are further to remove, from the sorted container, objects that were clustered into micro-clusters, selecting a new vantage point, building a new sorted container, and repeating clustering adjacent objects.
There is disclosed another example, wherein the new vantage point is a median object in the sorted container after removing the objects that were clustered into micro-containers.
There is disclosed another example, wherein the instructions are further to iterate removing objects that were clustered into micro-clusters, selecting a new vantage point, building a new sorted container, and repeating clustering adjacent objects, until an iteration forms no new clusters.
There is disclosed another example, wherein the instructions are further to iterate removing objects that were clustered into micro-clusters, selecting a new vantage point, building a new sorted container, and repeating clustering adjacent objects, up to a positive integer value MAX_PASSES.
There is disclosed another example, wherein the instructions are further to compute the maximum distance based on the universe of featurized objects.
There is disclosed another example, wherein the instructions are further to reject a micro-cluster if it has fewer than a positive integer MIN_VALUE of samples.
There is disclosed another example, wherein the instructions are further to close out a micro-cluster after determining that a next adjacent object has a distance greater than the maximum distance.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with all characters being a common character.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with all characters being hexadecimal ‘f’.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with all characters being hexadecimal ‘7’.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with all characters being hexadecimal ‘1’ or ‘0’.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with characters comprising a repeating pattern.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with a randomly generated a feature vector.
There is disclosed another example, wherein the instructions are further to find, for a micro-cluster, an object signature that reads on all objects in the micro-cluster.
There is disclosed another example, wherein the instructions are further to use the object signature to detect and remediate computer malware.
There is disclosed another example of a computer-implemented method of clustering a universe of featurized objects into micro-clusters, comprising selecting a vantage point having a feature vector; computing, for the featurized objects in the universe, respective distances from the vantage point, and sorting the featurized objects into a sorted container based on their distances from the vantage point; clustering adjacent objects into a plurality of micro-clusters based on determining that objects have a distance from a next adjacent object less than a maximum distance; and storing the micro-clusters onto a tangible computer-readable medium to modify operation of a computing apparatus based on objects in the micro-clusters.
There is disclosed another example, wherein computing respective distances comprises using a locality-sensitive hashing (LSH) algorithm.
There is disclosed another example, wherein the LSH algorithm is TLSH.
There is disclosed another example, further comprising removing, from the sorted container, objects that were clustered into micro-clusters, selecting a new vantage point, building a new sorted container, and repeating clustering adjacent objects.
There is disclosed another example, wherein the new vantage point is a median object in the sorted container after removing the objects that were clustered into micro-containers.
There is disclosed another example, further comprising iterating removing objects that were clustered into micro-clusters, selecting a new vantage point, building a new sorted container, and repeating clustering adjacent objects, until an iteration forms no new clusters.
There is disclosed another example, further comprising iterating removing objects that were clustered into micro-clusters, selecting a new vantage point, building a new sorted container, and repeating clustering adjacent objects, up to a positive integer value MAX_PASSES.
There is disclosed another example, further comprising computing the maximum distance based on the universe of featurized objects.
There is disclosed another example, further comprising rejecting a micro-cluster if it has fewer than a positive integer MIN_VALUE of samples.
There is disclosed another example, further comprising closing out a micro-cluster after determining that the next adjacent object has a distance greater than the maximum distance.
There is disclosed another example, wherein selecting the vantage point comprises selecting a feature vector with all characters being a common character.
There is disclosed another example, wherein selecting the vantage point comprises selecting a feature vector with all characters being hexadecimal ‘f’.
There is disclosed another example, wherein selecting the vantage point comprises selecting a feature vector with all characters being hexadecimal ‘7’.
There is disclosed another example, wherein selecting the vantage point comprises selecting a feature vector with all characters being hexadecimal ‘1’ or ‘0’.
There is disclosed another example, wherein selecting the vantage point comprises selecting a feature vector with characters comprising a repeating pattern.
There is disclosed another example, wherein selecting the vantage point comprises randomly generating a feature vector.
There is disclosed another example, further comprising finding, for a micro-cluster, an object signature that reads on all objects in the micro-cluster.
There is disclosed another example, further comprising using the object signature to detect and remediate computer malware.
There is disclosed another example of an apparatus comprising means for performing the method.
There is disclosed another example, wherein the means for performing the method comprise a processor and a memory.
There is disclosed another example, wherein the memory comprises machine-readable instructions that, when executed, cause the apparatus to perform the method.
There is disclosed another example, wherein the apparatus is a computing system.
There is disclosed another example of at least one computer readable medium comprising instructions that, when executed, implement a method or realize an apparatus as described.
There is disclosed another example of a computing platform, comprising: at least one hardware platform comprising a processor circuit and one or more memories; and instructions encoded with the one or more memories to instruct the processor circuit to cluster a universe of featurized objects into micro-clusters, the instructions to: receive a selected vantage point having a feature vector; compute, for the featurized objects in the universe, respective distances from the selected vantage point, and sort the featurized objects into a sorted container based on their distances from the selected vantage point; cluster adjacent objects into a plurality of micro-clusters based on determining that objects have a distance from a next adjacent object less than a maximum distance; and store the micro-clusters onto a tangible computer-readable medium to modify operation of a computing apparatus based on objects in the micro-clusters.
There is disclosed another example, wherein computing respective distances comprises using a locality-sensitive hashing (LSH) algorithm.
There is disclosed another example, wherein the LSH algorithm is TLSH.
There is disclosed another example, wherein the instructions are further to remove, from the sorted container, objects that were clustered into micro-clusters, selecting a new vantage point, building a new sorted container, and repeating clustering adjacent objects.
There is disclosed another example, wherein the new vantage point is a median object in the sorted container after removing the objects that were clustered into micro-containers.
There is disclosed another example, wherein the instructions are further to iterate removing objects that were clustered into micro-clusters, selecting a new vantage point, building a new sorted container, and repeating clustering adjacent objects, until an iteration forms no new clusters.
There is disclosed another example, wherein the instructions are further to iterate removing objects that were clustered into micro-clusters, selecting a new vantage point, building a new sorted container, and repeating clustering adjacent objects, up to a positive integer value MAX_PASSES.
There is disclosed another example, wherein the instructions are further to compute the maximum distance based on the universe of featurized objects.
There is disclosed another example, wherein the instructions are further to reject a micro-cluster if it has fewer than a positive integer MIN_VALUE of samples.
There is disclosed another example, wherein the instructions are further to close out a micro-cluster after determining that a next adjacent object has a distance greater than the maximum distance.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with all characters being a common character.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with all characters being hexadecimal ‘f’.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with all characters being hexadecimal ‘7’.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with all characters being hexadecimal ‘1’ or ‘0’.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with characters comprising a repeating pattern.
There is disclosed another example, wherein the selected vantage point comprises a feature vector with a randomly generated feature vector.
There is disclosed another example, wherein the instructions are further to find, for a micro-cluster, an object signature that reads on all objects in the micro-cluster.
There is disclosed another example, wherein the instructions are further to use the object signature to detect and remediate computer malware.
A system and method for micro-clustering will now be described with more particular reference to the attached FIGURES. It should be noted that throughout the FIGURES, certain reference numerals may be repeated to indicate that a particular device or block is referenced multiple times across several FIGURES. In other cases, similar elements may be given new numbers in different FIGURES. Neither of these practices is intended to require a particular relationship between the various embodiments disclosed. In certain examples, a genus or class of elements may be referred to by a reference numeral (“widget 10”), while individual species or examples of the element may be referred to by a hyphenated numeral (“first specific widget 10-1” and “second specific widget 10-2”).
Because one concern with security ecosystem 100 is to identify malware objects, the system may benefit from strong signature matching, which may be facilitated by the micro-clustering method of the present specification.
Security ecosystem 100 may include one or more protected enterprises 102. A single protected enterprise 102 is illustrated here for simplicity, and could be a business enterprise, a government entity, a family, a nonprofit organization, a church, or any other organization that may subscribe to security services provided, for example, by security services provider 190.
Within security ecosystem 100, one or more users 120 operate one or more client devices 110. A single user 120 and single client device 110 are illustrated here for simplicity, but a home or enterprise may have multiple users, each of which may have multiple devices, such as desktop computers, laptop computers, smart phones, tablets, hybrids, or similar.
Client devices 110 may be communicatively coupled to one another and to other network resources via local network 170. Local network 170 may be any suitable network or combination of one or more networks operating on one or more suitable networking protocols, including a local area network, a home network, an intranet, a virtual network, a wide area network, a wireless network, a cellular network, or the internet (optionally accessed via a proxy, virtual machine, or other similar security mechanism) by way of nonlimiting example. Local network 170 may also include one or more servers, firewalls, routers, switches, security appliances, antivirus servers, or other network devices, which may be single-purpose appliances, virtual machines, containers, or functions. Some functions may be provided on client devices 110.
In this illustration, local network 170 is shown as a single network for simplicity, but in some embodiments, local network 170 may include any number of networks, such as one or more intranets connected to the internet. Local network 170 may also provide access to an external network, such as the internet, via external network 172. External network 172 may similarly be any suitable type of network.
Local network 170 may connect to the internet via gateway 108, which may be responsible, among other things, for providing a logical boundary between local network 170 and external network 172. Local network 170 may also provide services such as dynamic host configuration protocol (DHCP), gateway services, router services, and switching services, and may act as a security portal across local boundary 104.
In some embodiments, gateway 108 could be a simple home router, or could be a sophisticated enterprise infrastructure including routers, gateways, firewalls, security services, deep packet inspection, web servers, or other services.
In further embodiments, gateway 108 may be a standalone internet appliance. Such embodiments are popular in cases in which ecosystem 100 includes a home or small business. In other cases, gateway 108 may run as a virtual machine or in another virtualized manner. In larger enterprises that features service function chaining (SFC) or NFV, gateway 108 may be include one or more service functions and/or virtualized network functions.
Local network 170 may also include a number of discrete IoT devices. For example, local network 170 may include IoT functionality to control lighting 132, thermostats or other environmental controls 134, a security system 136, and any number of other devices 140. Other devices 140 may include, as illustrative and nonlimiting examples, network attached storage (NAS), computers, printers, smart televisions, smart refrigerators, smart vacuum cleaners and other appliances, and network connected vehicles.
Local network 170 may communicate across local boundary 104 with external network 172. Local boundary 104 may represent a physical, logical, or other boundary. External network 172 may include, for example, websites, servers, network protocols, and other network-based services. In one example, an attacker 180 (or other similar malicious or negligent actor) also connects to external network 172. A security services provider 190 may provide services to local network 170, such as security software, security updates, network appliances, or similar. For example, MCAFEE, LLC provides a comprehensive suite of security services that may be used to protect local network 170 and the various devices connected to it.
It may be a goal of users 120 to successfully operate devices on local network 170 without interference from attacker 180. In one example, attacker 180 is a malware author whose goal or purpose is to cause malicious harm or mischief, for example, by injecting malicious object 182 into client device 110. Once malicious object 182 gains access to client device 110, it may try to perform work such as social engineering of user 120, a hardware-based attack on client device 110, modifying storage 150 (or volatile memory), modifying client application 112 (which may be running in memory), or gaining access to local resources. Furthermore, attacks may be directed at IoT objects. IoT objects can introduce new security challenges, as they may be highly heterogeneous, and in some cases may be designed with minimal or no security considerations. To the extent that these devices have security, it may be added on as an afterthought. Thus, IoT devices may in some cases represent new attack vectors for attacker 180 to leverage against local network 170.
Malicious harm or mischief may take the form of installing root kits or other malware on client devices 110 to tamper with the system, installing spyware or adware to collect personal and commercial data, defacing websites, operating a botnet such as a spam server, or simply to annoy and harass users 120. Thus, one aim of attacker 180 may be to install his malware on one or more client devices 110 or any of the IoT devices described. As used throughout this specification, malicious software (“malware”) includes any object configured to provide unwanted results or do unwanted work. In many cases, malware objects will be executable objects, including, by way of nonlimiting examples, viruses, Trojans, zombies, rootkits, backdoors, worms, spyware, adware, ransomware, dialers, payloads, malicious browser helper objects, tracking cookies, loggers, or similar objects designed to take a potentially-unwanted action, including, by way of nonlimiting example, data destruction, data denial, covert data collection, browser hijacking, network proxy or redirection, covert tracking, data logging, keylogging, excessive or deliberate barriers to removal, contact harvesting, and unauthorized self-propagation. In some cases, malware could also include negligently-developed software that causes such results even without specific intent.
In enterprise contexts, attacker 180 may also want to commit industrial or other espionage, such as stealing classified or proprietary data, stealing identities, or gaining unauthorized access to enterprise resources. Thus, attacker 180's strategy may also include trying to gain physical access to one or more client devices 110 and operating them without authorization, so that an effective security policy may also include provisions for preventing such access.
In another example, a software developer may not explicitly have malicious intent, but may develop software that poses a security risk. For example, a well-known and often-exploited security flaw is the so-called buffer overrun, in which a malicious user is able to enter an overlong string into an input form and thus gain the ability to execute arbitrary instructions or operate with elevated privileges on a computing device. Buffer overruns may be the result, for example, of poor input validation or use of insecure libraries, and in many cases arise in nonobvious contexts. Thus, although not malicious, a developer contributing software to an application repository or programming an IoT device may inadvertently provide attack vectors for attacker 180. Poorly-written applications may also cause inherent problems, such as crashes, data loss, or other undesirable behavior. Because such software may be desirable itself, it may be beneficial for developers to occasionally provide updates or patches that repair vulnerabilities as they become known. However, from a security perspective, these updates and patches are essentially new objects that must themselves be validated.
Protected enterprise 102 may contract with or subscribe to a security services provider 190, which may provide security services, updates, antivirus definitions, patches, products, and services. MCAFEE, LLC is a nonlimiting example of such a security services provider that offers comprehensive security and antivirus solutions. In some cases, security services provider 190 may include a threat intelligence capability such as the global threat intelligence (GTI™) database provided by MCAFEE, LLC, or similar competing products. Security services provider 190 may update its threat intelligence database by analyzing new candidate malicious objects as they appear on client networks and characterizing them as malicious or benign.
Other security considerations within security ecosystem 100 may include parents' or employers' desire to protect children or employees from undesirable content, such as pornography, adware, spyware, age-inappropriate content, advocacy for certain political, religious, or social movements, or forums for discussing illegal or dangerous activities, by way of nonlimiting example.
The present specification teaches a novel clustering algorithm. Illustrative examples of known clustering algorithms include, without limitation, DBSCAN, K-means, Binary Tree, Fuzzy Clustering, Affinity Propagation, Normal Distribution, Mean Shift, Hierarchical Clustering, Spectral Clustering, and Mean Clustering.
When a system or an enterprise encounters a new, unknown object, the object may be featurized and mapped into a cluster space. If the object clusters strongly with other objects with known reputations, then at least as an initial classification, the system may assume that the object has the same classification as other objects in the cluster. For example, if all objects are known safe, the new object may be treated as safe. If all are known malicious, the object may be treated as malicious. If there are different classifications within the cluster, then the classification for a majority or supermajority of the known objects may be used for the new object.
In the example of
In one illustrative example, mapping the objects may include extracting features from the object into a feature vector. DBSCAN is used here as an illustration of the principles of clustering. Other clustering algorithms may use different algorithms, although the concept of similarity based on proximity may, at some level, be preserved.
Nonlimiting and illustrative examples of features that may be used in a feature vector include:
The foregoing list is illustrative only and non-exhaustive.
For a particular embodiment, the system designer selects a number of n features for the system, and extracts those n features from each sample. In this case, n may be any integer where n≥1, although as the number of features increases, so does the complexity of the system. Thus, a system designer may trade off between feature granularity and system performance, depending on the needs of an embodiment and the available compute resources.
In an illustrative clustering algorithm, each sample is mapped into an n-dimensional space, and the system computes scalar distance between each point and one or more nearest neighbors. The designer may select a distance ¿, and any objects within distance ¿ of one another cluster together.
In this example, the samples have clustered into a plurality of clusters, namely cluster 204, cluster 208, cluster 212, cluster 216, cluster 220, and cluster 224. A small number of points are illustrated here to simplify the illustration, although in a real-world use case, the number of points may be in the hundreds, thousands, millions, or billions. The clusters and distances are not necessarily shown to scale, each point may represent some greater number of points, and each connection/proximity line may represent one or more proximity connections (e.g., each proximity line represents a connection to a point within distance ε).
One issue with large clusters is that it can be difficult to craft a meaningful signature of features that is broad enough to capture all points in the cluster, and narrow enough to be meaningful. Thus, one advantage of the present specification is that the system and method disclosed can form micro-clusters, which are generally expected to be smaller and more focused than the large clusters from an algorithm such as DBSCAN.
The clusters illustrated here may include a number of “core points,” which are points proximate to at least minPTS points. For example, if minPTS=4, then a sample must be proximate to at least three other points (counting itself as the fourth proximate point) to be considered a core point. Core points are important to DBSCAN and some other clustering methods because core points consistently map to the same cluster across different runs, even if the data are in a different order. A noncore point 236 is also illustrated. This point is within distance ¿ of at least one other point, but not enough points to be considered a core point. Thus, depending on the ordering of the data, the point may cluster with either 204 or 208. Noncore point 236 appears as a “bridge” between clusters 204 and 208.
Clusters 212 and 216 may be two separate clusters, or one single cluster, depending on the designer's selection of minPts. A chain of points with few connections forms a bridge between the two clusters. This illustrates the principle that the selection of minPTS may influence the number of clusters that form. A higher value for minPTS may form smaller clusters with greater similarity. A smaller value may form larger clusters with rougher similarity. The selection will depend on the needs of a particular embodiment.
Clusters 208 and 212 are joined by a bridge 232 of noncore points. This bridge illustrates that some points may share some similarity, but as the cluster drifts, the overall similarity of the cluster may decrease, and at some point its predictive value may be compromised. Indeed, clusters 204, 208, 212, and 216 could form one large supercluster if minPTS is selected to be sufficiently small. Whether this supercluster would be sufficiently predictive of the properties of members of the cluster may depend on the specific use case.
In contrast, clusters 220 and 224 have no bridge to any other clusters, so those clusters may remain the same, regardless of the value of minPTS. However, the addition of new data to the dataset may influence later runs of the algorithm and may form bridges.
One advantage of the present clustering method is that, in at least some examples, it is unnecessary to differentiate core points from noncore points. Because micro-clustering does not rely on such definitions, all points in a cluster can be considered as effectively equal.
Also illustrated here in an outlier point 228. Outlier 228 is not similar enough to any other sample to cluster, regardless of the value of minPTS. Thus, clusters are not predictive of the properties of outlier 228.
In an illustrative use case, a security services vendor receives new samples that have been found in the wild, such as a batch of new PEs. These PEs have not yet been characterized, and so they have not yet been assigned reputations. Detailed static analysis, dynamic analysis, and/or sandbox analysis may yield a high-confidence prediction of whether the new objects are safe (green), suspicious (yellow), or malicious (red). However, performing analysis of all the new objects takes time and compute resources, sometimes significantly (e.g., the analysis may take hours or days, depending on the nature of the samples and the number of samples). To provide a medium-confidence intermediate reputation for the new objects, the security services vendor may cluster them in an object space as illustrated. If a plurality, majority, supermajority, or other proportion of samples in the clusters have a common reputation, then the new samples may be assigned that reputation, at least preliminarily (or permanently, as required). If a particular cluster does not have a satisfactory proportion of reputations (e.g., if the results are too mixed to be useful), then as an alternative, the sample may get its reputation from a selected number of nearest neighbors.
Starting in
In
Turning to
To provide this Moon vantage, a fictional or arbitrary feature vector may be constructed, with the expectation that the feature vector for the Moon vantage will not be similar to any of the samples in the sample universe. In one illustrative example, the feature vector hash for the Moon vantage sample is selected as a string of identical characters, such as hexadecimal F, (the last hexadecimal digit), hexadecimal 7 (the median hexadecimal digit), or hexadecimal 0 or 1 (low hexadecimal digits). In practice, it is extraordinarily unlikely for any feature vector hash to be a string of identical digits. Other methods of selecting a Moon vantage are also available, such as using an alternating series of digits, or generating a feature vector hash using a random or pseudorandom number generator. If the feature vector hash is sufficiently large, then even selecting a random hash is likely to yield a feature vector that is not close to any of the samples in unsorted sample universe 302.
To begin the method, the system may compute the distance from vantage sample 304 to each individual sample. This yields a scalar value that can be easily compared. The samples may then be sorted according to their distance to the Moon vantage sample 304, forming first pass sorted container 308-1. For example, sample S1 is the closest sample to vantage sample 304, while sample S12 is the furthest from vantage sample 304. The other samples are ordered according to their distance from the moon vantage sample 304.
Turning to
The system then continues scanning the sorted container to identify additional micro-clusters such as micro-cluster 316-2.
Once micro-clusters have been found, their member samples may be excluded from the sorted container, and a new sorted container may be formed that includes only those samples that did not cluster in the first pass. However, performing a second pass with the same vantage sample 304 would yield the same results because the distances would not change. Thus, a new vantage sample is selected. In an illustrative case, the new vantage sample is a median sample 320-1, or in other words, the sample in the middle of the new sorted container once the micro-clustered samples have been removed. In this case, sample eight will be the median sample once the samples of micro-clusters 316-1 and 316-2 are removed.
Turning to
Turning to
Turning to
While more complex algorithms such as DBSCAN attempt to cluster every sample into a cluster, the method of the present specification need not be concerned with unclustered samples. One purpose of the method is to define smaller micro-clusters that can be used to create meaningful signatures for identifying objects that would cluster with these micro-clusters. By way of illustration, a DBSCAN cluster that has 100 objects may be reduced to a set of three micro-clusters that have 30 objects each. In this case, some outliers will be left unclustered. But from the three micro-clusters that form, it may be more practical to craft a signature that reads on every sample in that micro-cluster. This new signature may be broad enough to capture other similar objects, while being narrow enough to avoid too many false positives. Thus, the micro-clustering algorithm may yield more clusters, and thus provide more signatures, while the signatures that are provided may be more useful and practical. Furthermore, the exclusion of certain points that do not cluster may not be problematic, because the signature of clusters that do form is still a useful tool for identifying malicious objects in the wild.
This method may not, and may not be intended to, form perfect clusters, in the sense that it will form clusters that are smaller than necessary. For example, micro-cluster 316-1 might theoretically have incorporated sample 12 using a different clustering algorithm. However, sample 12 never had a chance to be sorted into cluster 316-1, because cluster 316-1 closed out immediately in the first pass, which removed the theoretical possibility of including sample 12 in a later pass. However, the intent of this method is not necessarily to produce the largest possible cluster, but to produce micro-clusters that are precise enough to be used for signature authoring, or other purpose for which two smaller clusters may serve just as well as (or better than) one large cluster.
In block 504, the system selects a “Moon” vantage point. The Moon vantage point is intended to be a point that is dissimilar from all points in the unsorted set of objects in a sample universe.
Meta-block 510 represents operations that are repeated until MAX_PASSES has been reached, or until micro-clusters no longer form in a pass. In one illustrative example, MAX_PASSES may be on the order of 6 to 10 passes. In some cases, beyond that, diminishing returns are experience.
In block 508, the system creates a sorted container including all objects in the sample universe. To create this sorted container, the system sorts each sample using an LSH-compliant value against the vantage point. In the case of the first pass, the vantage point is the previously selected “moon” vantage point. In subsequent passes, the vantage point may be a vantage point selected from the remaining objects in the sorted container after a pass.
Meta-block 515 represents a process that is iterated for each sorted container. Starting in block 516, the system selects a sample (N) and a sample (N−1), representing an adjacent sample. The system measures the distance between (N) and (N−1), such as according to an LSH-compliant distance measurement.
In decision block 520, the system determines whether the distance between the two samples is less than the threshold or maximum distance.
If the distance is less than the threshold, the two points are to be clustered together. In that case, in block 532, sample (N−1) is added to the micro-cluster that (N) belongs to. The system then increments N (N++) and control returns to block 516, where the next sample is checked.
Returning to decision block 520, if the distance between the two samples is greater than MAX_DIST, then in block 524 the system determines whether the current micro-cluster is greater than a threshold of minimum samples for a micro-cluster. This is to ensure that micro-clusters are not formed with too few samples (e.g., with two or only a few samples, depending on the design constraints). If the micro-cluster is too small, then in block 512 the system resets the micro-cluster, meaning that the objects that were clustered into that micro-cluster remain unclustered for this pass. N is then incremented (N++), and control returns to block 516.
Returning to decision block 524, if the number of samples in the micro-cluster is greater than the minimum micro-cluster size, then in block 528, the system saves the formed micro-cluster and marks the clustered samples for removal.
After clusters have been formed in meta-block 515, in block 536 the system removes from the sorted container all samples that were marked for removal (i.e., samples that were clustered in that pass).
In block 540, the system selects a new vantage point for the next pass through meta-block 510. In an illustrative example, this may be the median sample in the remaining sample set after clustered samples have been removed.
If this is not the last pass through the method, then control returns to block 508, and another pass is performed on the newly sorted container.
After all passes are complete, in block 544 the system may act on the micro-clusters. For example, an automated system, or a human user may examine the samples and craft a signature that reads on all samples in a micro-cluster. One signature may be crafted per micro-cluster, and these signatures can then be provided to antivirus or other detection software on client machines. These signatures can be used to beneficially identify malicious software that may be encountered in the wild. Advantageously, the signatures may be broader than a simple hash of an object, which will only catch identical objects. Signatures are useful because without signatures, malware authors can simply change a few bytes within a file to pass malware scans. But with strong signature matching, it is much more difficult for malware authors to change their products without losing functionality. However, if the signatures are too broad, then they will catch many false positives. Thus, by crafting signatures that read on micro-clusters (instead of attempting to craft signatures that read on large clusters from algorithms such as DBSCAN), the system may provide more beneficial and targeted malware detection.
Hardware platform 600 is configured to provide a computing device. In various embodiments, a “computing device” may be or comprise, by way of nonlimiting example, a computer, workstation, server, mainframe, virtual machine (whether emulated or on a “bare metal” hypervisor), network appliance, container, IoT device, high performance computing (HPC) environment, a data center, a communications service provider infrastructure (e.g., one or more portions of an Evolved Packet Core), an in-memory computing environment, a computing system of a vehicle (e.g., an automobile or airplane), an industrial control system, embedded computer, embedded controller, embedded sensor, personal digital assistant, laptop computer, cellular telephone, internet protocol (IP) telephone, smart phone, tablet computer, convertible tablet computer, computing appliance, receiver, wearable computer, handheld calculator, or any other electronic, microelectronic, or microelectromechanical device for processing and communicating data. At least some of the methods and systems disclosed in this specification may be embodied by or carried out on a computing device.
In the illustrated example, hardware platform 600 is arranged in a point-to-point (PtP) configuration. This PtP configuration is popular for personal computer (PC) and server-type devices, although it is not so limited, and any other bus type may be used.
Hardware platform 600 is an example of a platform that may be used to implement embodiments of the teachings of this specification. For example, instructions could be stored in storage 650. Instructions could also be transmitted to the hardware platform in an ethereal form, such as via a network interface, or retrieved from another source via any suitable interconnect. Once received (from any source), the instructions may be loaded into memory 604, and may then be executed by one or more processor 602 to provide elements such as an operating system 606, operational agents 608, or data 612.
Hardware platform 600 may include several processors 602. For simplicity and clarity, only processors PROC0602-1 and PROC1602-2 are shown. Additional processors (such as 2, 4, 8, 16, 24, 32, 64, or 128 processors) may be provided as necessary, while in other embodiments, only one processor may be provided. Processors may have any number of cores, such as 1, 2, 4, 8, 16, 24, 32, 64, or 128 cores.
Processors 602 may be any type of processor and may communicatively couple to chipset 616 via, for example, PtP interfaces. Chipset 616 may also exchange data with other elements, such as a high performance graphics adapter 622. In alternative embodiments, any or all of the PtP links illustrated in
Two memories, 604-1 and 604-2 are shown, connected to PROC0602-1 and PROC1602-2, respectively. As an example, each processor is shown connected to its memory in a direct memory access (DMA) configuration, though other memory architectures are possible, including ones in which memory 604 communicates with a processor 602 via a bus. For example, some memories may be connected via a system bus, or in a data center, memory may be accessible in a remote DMA (RDMA) configuration.
Memory 604 may include any form of volatile or nonvolatile memory including, without limitation, magnetic media (e.g., one or more tape drives), optical media, flash, random access memory (RAM), double data rate RAM (DDR RAM) nonvolatile RAM (NVRAM), static RAM (SRAM), dynamic RAM (DRAM), persistent RAM (PRAM), data-centric (DC) persistent memory (e.g., Intel Optane/3D-crosspoint), cache, Layer 1 (L1) or Layer 2 (L2) memory, on-chip memory, registers, virtual memory region, read-only memory (ROM), flash memory, removable media, tape drive, cloud storage, or any other suitable local or remote memory component or components. Memory 604 may be used for short, medium, and/or long-term storage. Memory 604 may store any suitable data or information utilized by platform logic. In some embodiments, memory 604 may also comprise storage for instructions that may be executed by the cores of processors 602 or other processing elements (e.g., logic resident on chipsets 616) to provide functionality.
In certain embodiments, memory 604 may comprise a relatively low-latency volatile main memory, while storage 650 may comprise a relatively higher-latency nonvolatile memory. However, memory 604 and storage 650 need not be physically separate devices, and in some examples may represent simply a logical separation of function (if there is any separation at all). It should also be noted that although DMA is disclosed by way of nonlimiting example, DMA is not the only protocol consistent with this specification, and that other memory architectures are available.
Certain computing devices provide main memory 604 and storage 650, for example, in a single physical memory device, and in other cases, memory 604 and/or storage 650 are functionally distributed across many physical devices. In the case of virtual machines or hypervisors, all or part of a function may be provided in the form of software or firmware running over a virtualization layer to provide the logical function, and resources such as memory, storage, and accelerators may be disaggregated (i.e., located in different physical locations across a data center). In other examples, a device such as a network interface may provide only the minimum hardware interfaces necessary to perform its logical operation, and may rely on a software driver to provide additional necessary logic. Thus, each logical block disclosed herein is broadly intended to include one or more logic elements configured and operable for providing the disclosed logical operation of that block. As used throughout this specification, “logic elements” may include hardware, external hardware (digital, analog, or mixed-signal), software, reciprocating software, services, drivers, interfaces, components, modules, algorithms, sensors, components, firmware, hardware instructions, microcode, programmable logic, or objects that can coordinate to achieve a logical operation.
Graphics adapter 622 may be configured to provide a human-readable visual output, such as a command-line interface (CLI) or graphical desktop such as Microsoft Windows, Apple OSX desktop, or a Unix/Linux X Window System-based desktop. Graphics adapter 622 may provide output in any suitable format, such as a coaxial output, composite video, component video, video graphics array (VGA), or digital outputs such as digital visual interface (DVI), FPDLink, DisplayPort, or high definition multimedia interface (HDMI), by way of nonlimiting example. In some examples, graphics adapter 622 may include a hardware graphics card, which may have its own memory and its own graphics processing unit (GPU).
Chipset 616 may be in communication with a bus 628 via an interface circuit. Bus 628 may have one or more devices that communicate over it, such as a bus bridge 632, I/O devices 635, accelerators 646, communication devices 640, and a keyboard and/or mouse 638, by way of nonlimiting example. In general terms, the elements of hardware platform 600 may be coupled together in any suitable manner. For example, a bus may couple any of the components together. A bus may include any known interconnect, such as a multi-drop bus, a mesh interconnect, a fabric, a ring interconnect, a round-robin protocol, a PtP interconnect, a serial interconnect, a parallel bus, a coherent (e.g., cache coherent) bus, a layered protocol architecture, a differential bus, or a Gunning transceiver logic (GTL) bus, by way of illustrative and nonlimiting example.
Communication devices 640 can broadly include any communication not covered by a network interface and the various I/O devices described herein. This may include, for example, various universal serial bus (USB), FireWire, Lightning, or other serial or parallel devices that provide communications.
I/O Devices 635 may be configured to interface with any auxiliary device that connects to hardware platform 600 but that is not necessarily a part of the core architecture of hardware platform 600. A peripheral may be operable to provide extended functionality to hardware platform 600, and may or may not be wholly dependent on hardware platform 600. In some cases, a peripheral may be a computing device in its own right. Peripherals may include input and output devices such as displays, terminals, printers, keyboards, mice, modems, data ports (e.g., serial, parallel, USB, Firewire, or similar), network controllers, optical media, external storage, sensors, transducers, actuators, controllers, data acquisition buses, cameras, microphones, speakers, or external storage, by way of nonlimiting example.
In one example, audio I/O 642 may provide an interface for audible sounds, and may include in some examples a hardware sound card. Sound output may be provided in analog (such as a 3.5 mm stereo jack), component (“RCA”) stereo, or in a digital audio format such as S/PDIF, AES3, AES47, HDMI, USB, Bluetooth, or Wi-Fi audio, by way of nonlimiting example. Audio input may also be provided via similar interfaces, in an analog or digital form.
Bus bridge 632 may be in communication with other devices such as a keyboard/mouse 638 (or other input devices such as a touch screen, trackball, etc.), communication devices 640 (such as modems, network interface devices, peripheral interfaces such as PCI or PCIe, or other types of communication devices that may communicate through a network), audio I/O 642, a data storage device 644, and/or accelerators 646. In alternative embodiments, any portions of the bus architectures could be implemented with one or more PtP links.
Operating system 606 may be, for example, Microsoft Windows, Linux, UNIX, Mac OS X, IOS, MS-DOS, or an embedded or real-time operating system (including embedded or real-time flavors of the foregoing). In some embodiments, a hardware platform 600 may function as a host platform for one or more guest systems that invoke application (e.g., operational agents 608).
Operational agents 608 may include one or more computing engines that may include one or more nontransitory computer-readable mediums having stored thereon executable instructions operable to instruct a processor to provide operational functions. At an appropriate time, such as upon booting hardware platform 600 or upon a command from operating system 606 or a user or security administrator, a processor 602 may retrieve a copy of the operational agent (or software portions thereof) from storage 650 and load it into memory 604. Processor 602 may then iteratively execute the instructions of operational agents 608 to provide the desired methods or functions.
As used throughout this specification, an “engine” includes any combination of one or more logic elements, of similar or dissimilar species, operable for and configured to perform one or more methods provided by the engine. In some cases, the engine may be or include a special integrated circuit designed to carry out a method or a part thereof, a field-programmable gate array (FPGA) programmed to provide a function, a special hardware or microcode instruction, other programmable logic, and/or software instructions operable to instruct a processor to perform the method. In some cases, the engine may run as a “daemon” process, background process, terminate-and-stay-resident program, a service, system extension, control panel, bootup procedure, basic in/output system (BIOS) subroutine, or any similar program that operates with or without direct user interaction. In certain embodiments, some engines may run with elevated privileges in a “driver space” associated with ring 0, 1, or 2 in a protection ring architecture. The engine may also include other hardware, software, and/or data, including configuration files, registry entries, application programming interfaces (APIs), and interactive or user-mode software by way of nonlimiting example.
In some cases, the function of an engine is described in terms of a “circuit” or “circuitry to” perform a particular function. The terms “circuit” and “circuitry” should be understood to include both the physical circuit, and in the case of a programmable circuit, any instructions or data used to program or configure the circuit.
Where elements of an engine are embodied in software, computer program instructions may be implemented in programming languages, such as an object code, an assembly language, or a high-level language such as OpenCL, FORTRAN, C, C++, JAVA, or HTML. These may be used with any compatible operating systems or operating environments. Hardware elements may be designed manually, or with a hardware description language such as Spice, Verilog, and VHDL. The source code may define and use various data structures and communication messages. The source code may be in a computer executable form (e.g., via an interpreter), or the source code may be converted (e.g., via a translator, assembler, or compiler) into a computer executable form, or converted to an intermediate form such as byte code. Where appropriate, any of the foregoing may be used to build or describe appropriate discrete or integrated circuits, whether sequential, combinatorial, state machines, or otherwise.
A network interface may be provided to communicatively couple hardware platform 600 to a wired or wireless network or fabric. A “network,” as used throughout this specification, may include any communicative platform operable to exchange data or information within or between computing devices, including, by way of nonlimiting example, a local network, a switching fabric, an ad-hoc local network, Ethernet (e.g., as defined by the IEEE 802.3 standard), Fiber Channel, InfiniBand, Wi-Fi, or other suitable standard. Intel Omni-Path Architecture (OPA), TrueScale, Ultra Path Interconnect (UPI) (formerly called QuickPath Interconnect, QPI, or KTI), FibreChannel, Ethernet, FibreChannel over Ethernet (FCOE), InfiniBand, PCI, PCIe, fiber optics, millimeter wave guide, an internet architecture, a packet data network (PDN) offering a communications interface or exchange between any two nodes in a system, a local area network (LAN), metropolitan area network (MAN), wide area network (WAN), wireless local area network (WLAN), virtual private network (VPN), intranet, plain old telephone system (POTS), or any other appropriate architecture or system that facilitates communications in a network or telephonic environment, either with or without human interaction or intervention. A network interface may include one or more physical ports that may couple to a cable (e.g., an Ethernet cable, other cable, or waveguide).
In some cases, some or all of the components of hardware platform 600 may be virtualized, in particular the processor(s) and memory. For example, a virtualized environment may run on OS 606, or OS 606 could be replaced with a hypervisor or virtual machine manager. In this configuration, a virtual machine running on hardware platform 600 may virtualize workloads. A virtual machine in this configuration may perform essentially all of the functions of a physical hardware platform.
In a general sense, any suitably-configured processor can execute any type of instructions associated with the data to achieve the operations illustrated in this specification. Any of the processors or cores disclosed herein could transform an element or an article (for example, data) from one state or thing to another state or thing. In another example, some activities outlined herein may be implemented with fixed logic or programmable logic (for example, software and/or computer instructions executed by a processor).
Various components of the system depicted in
NFV is generally considered distinct from software defined networking (SDN), but they can interoperate together, and the teachings of this specification should also be understood to apply to SDN in appropriate circumstances. For example, virtual network functions (VNFs) may operate within the data plane of an SDN deployment. NFV was originally envisioned as a method for providing reduced capital expenditure (Capex) and operating expenses (Opex) for telecommunication services. One feature of NFV is replacing proprietary, special-purpose hardware appliances with virtual appliances running on commercial off-the-shelf (COTS) hardware within a virtualized environment. In addition to Capex and Opex savings, NFV provides a more agile and adaptable network. As network loads change, VNFs can be provisioned (“spun up”) or removed (“spun down”) to meet network demands. For example, in times of high load, more load balancing VNFs may be spun up to distribute traffic to more workload servers (which may themselves be VMs). In times when more suspicious traffic is experienced, additional firewalls or deep packet inspection (DPI) appliances may be needed.
Because NFV started out as a telecommunications feature, many NFV instances are focused on telecommunications. However, NFV is not limited to telecommunication services. In a broad sense, NFV includes one or more VNFs running within a network function virtualization infrastructure (NFVI), such as NFVI 700. Often, the VNFs are inline service functions that are separate from workload servers or other nodes. These VNFs can be chained together into a service chain, which may be defined by a virtual subnetwork, and which may include a serial string of network services that provide behind-the-scenes work, such as security, logging, billing, and similar.
In the example of
Note that NFV orchestrator 701 itself may be virtualized (rather than a special-purpose hardware appliance). NFV orchestrator 701 may be integrated within an existing SDN system, wherein an operations support system (OSS) manages the SDN. This may interact with cloud resource management systems (e.g., OpenStack) to provide NFV orchestration. An NFVI 700 may include the hardware, software, and other infrastructure to enable VNFs to run. This may include a hardware platform 702 on which one or more VMs 704 may run. For example, hardware platform 702-1 in this example runs VMs 704-1 and 704-2. Hardware platform 702-2 runs VMs 704-3 and 704-4. Each hardware platform 702 may include a respective hypervisor 720, virtual machine manager (VMM), or similar function, which may include and run on a native (bare metal) operating system, which may be minimal so as to consume very few resources. For example, hardware platform 702-1 has hypervisor 720-1, and hardware platform 702-2 has hypervisor 720-2.
Hardware platforms 702 may be or comprise a rack or several racks of blade or slot servers (including, e.g., processors, memory, and storage), one or more data centers, other hardware resources distributed across one or more geographic locations, hardware switches, or network interfaces. An NFVI 700 may also include the software architecture that enables hypervisors to run and be managed by NFV orchestrator 701.
Running on NFVI 700 are VMs 704, each of which in this example is a VNF providing a virtual service appliance. Each VM 704 in this example includes an instance of the Data Plane Development Kit (DPDK) 716, a virtual operating system 708, and an application providing the VNF 712. For example, VM 704-1 has virtual OS 708-1, DPDK 716-1, and VNF 712-1. VM 704-2 has virtual OS 708-2, DPDK 716-2, and VNF 712-2. VM 704-3 has virtual OS 708-3, DPDK 716-3, and VNF 712-3. VM 704-4 has virtual OS 708-4, DPDK 716-4, and VNF 712-4.
Virtualized network functions could include, as nonlimiting and illustrative examples, firewalls, intrusion detection systems, load balancers, routers, session border controllers, DPI services, network address translation (NAT) modules, or call security association.
The illustration of
The illustrated DPDK instances 716 provide a set of highly-optimized libraries for communicating across a virtual switch (vSwitch) 722. Like VMs 704, vSwitch 722 is provisioned and allocated by a hypervisor 720. The hypervisor uses a network interface to connect the hardware platform to the data center fabric (e.g., a host fabric interface (HFI)). This HFI may be shared by all VMs 704 running on a hardware platform 702. Thus, a vSwitch may be allocated to switch traffic between VMs 704. The vSwitch may be a pure software vSwitch (e.g., a shared memory vSwitch), which may be optimized so that data are not moved between memory locations, but rather, the data may stay in one place, and pointers may be passed between VMs 704 to simulate data moving between ingress and egress ports of the vSwitch. The vSwitch may also include a hardware driver (e.g., a hardware network interface IP block that switches traffic, but that connects to virtual ports rather than physical ports). In this illustration, a distributed vSwitch 722 is illustrated, wherein vSwitch 722 is shared between two or more physical hardware platforms 702.
Containerization infrastructure 800 runs on a hardware platform such as containerized server 804. Containerized server 804 may provide processors, memory, one or more network interfaces, accelerators, and/or other hardware resources.
Running on containerized server 804 is a shared kernel 808. One distinction between containerization and virtualization is that containers run on a common kernel with the main operating system and with each other. In contrast, in virtualization, the processor and other hardware resources are abstracted or virtualized, and each virtual machine provides its own kernel on the virtualized hardware.
Running on shared kernel 808 is main operating system 812. Commonly, main operating system 812 is a Unix or Linux-based operating system, although containerization infrastructure is also available for other types of systems, including Microsoft Windows systems and Macintosh systems. Running on top of main operating system 812 is a containerization layer 816. For example, Docker is a popular containerization layer that runs on a number of operating systems, and relies on the Docker daemon. Newer operating systems (including Fedora Linux 32 and later) that use version 2 of the kernel control groups service (cgroups v2) feature appear to be incompatible with the Docker daemon. Thus, these systems may run with an alternative known as Podman that provides a containerization layer without a daemon.
Various factions debate the advantages and/or disadvantages of using a daemon-based containerization layer (e.g., Docker) versus one without a daemon (e.g., Podman). Such debates are outside the scope of the present specification, and when the present specification speaks of containerization, it is intended to include any containerization layer, whether it requires the use of a daemon or not.
Main operating system 812 may also provide services 818, which provide services and interprocess communication to userspace applications 820.
Services 818 and userspace applications 820 in this illustration are independent of any container.
As discussed above, a difference between containerization and virtualization is that containerization relies on a shared kernel. However, to maintain virtualization-like segregation, containers do not share interprocess communications, services, or many other resources. Some sharing of resources between containers can be approximated by permitting containers to map their internal file systems to a common mount point on the external file system. Because containers have a shared kernel with the main operating system 812, they inherit the same file and resource access permissions as those provided by shared kernel 808. For example, one popular application for containers is to run a plurality of web servers on the same physical hardware. The Docker daemon provides a shared socket, docker.sock, that is accessible by containers running under the same Docker daemon. Thus, one container can be configured to provide only a reverse proxy for mapping hypertext transfer protocol (HTTP) and hypertext transfer protocol secure (HTTPS) requests to various containers. This reverse proxy container can listen on docker.sock for newly spun up containers. When a container spins up that meets certain criteria, such as by specifying a listening port and/or virtual host, the reverse proxy can map HTTP or HTTPS requests to the specified virtual host to the designated virtual port. Thus, only the reverse proxy host may listen on ports 80 and 443, and any request to subdomain1.example.com may be directed to a virtual port on a first container, while requests to subdomain2.example.com may be directed to a virtual port on a second container.
Other than this limited sharing of files or resources, which generally is explicitly configured by an administrator of containerized server 804, the containers themselves are completely isolated from one another. However, because they share the same kernel, it is relatively easier to dynamically allocate compute resources such as CPU time and memory to the various containers. Furthermore, it is common practice to provide only a minimum set of services on a specific container, and the container does not need to include a full bootstrap loader because it shares the kernel with a containerization host (i.e. containerized server 804).
Thus, “spinning up” a container is often relatively faster than spinning up a new virtual machine that provides a similar service. Furthermore, a containerization host does not need to virtualize hardware resources, so containers access those resources natively and directly. While this provides some theoretical advantages over virtualization, modern hypervisors-especially type 1, or “bare metal,” hypervisors-provide such near-native performance that this advantage may not always be realized.
In this example, containerized server 804 hosts two containers, namely container 830 and container 840.
Container 830 may include a minimal operating system 832 that runs on top of shared kernel 808. Note that a minimal operating system is provided as an illustrative example, and is not mandatory. In fact, container 830 may perform as full an operating system as is necessary or desirable. Minimal operating system 832 is used here as an example simply to illustrate that in common practice, the minimal operating system necessary to support the function of the container (which in common practice, is a single or monolithic function) is provided.
On top of minimal operating system 832, container 830 may provide one or more services 834. Finally, on top of services 834, container 830 may also provide userspace applications 836, as necessary.
Container 840 may include a minimal operating system 842 that runs on top of shared kernel 808. Note that a minimal operating system is provided as an illustrative example, and is not mandatory. In fact, container 840 may perform as full an operating system as is necessary or desirable. Minimal operating system 842 is used here as an example simply to illustrate that in common practice, the minimal operating system necessary to support the function of the container (which in common practice, is a single or monolithic function) is provided.
On top of minimal operating system 842, container 840 may provide one or more services 844. Finally, on top of services 844, container 840 may also provide userspace applications 846, as necessary.
Using containerization layer 816, containerized server 804 may run discrete containers, each one providing the minimal operating system and/or services necessary to provide a particular function. For example, containerized server 804 could include a mail server, a web server, a secure shell server, a file server, a weblog, cron services, a database server, and many other types of services. In theory, these could all be provided in a single container, but security and modularity advantages are realized by providing each of these discrete functions in a discrete container with its own minimal operating system necessary to provide those services.
The foregoing outlines features of several embodiments so that those skilled in the art may better understand various aspects of the present disclosure. The foregoing detailed description sets forth examples of apparatuses, methods, and systems relating to micro-clustering according to one or more embodiments of the present disclosure. Features such as structure(s), function(s), and/or characteristic(s), for example, are described with reference to one embodiment as a matter of convenience; various embodiments may be implemented with any suitable one or more of the described features.
As used throughout this specification, the phrase “an embodiment” is intended to refer to one or more embodiments. Furthermore, different uses of the phrase “an embodiment” may refer to different embodiments. The phrases “in another embodiment” or “in a different embodiment” refer to an embodiment different from the one previously described, or the same embodiment with additional features. For example, “in an embodiment, features may be present. In another embodiment, additional features may be present.” The foregoing example could first refer to an embodiment with features A, B, and C, while the second could refer to an embodiment with features A, B, C, and D, with features, A, B, and D, with features, D, E, and F, or any other variation.
In the foregoing description, various aspects of the illustrative implementations may be described using terms commonly employed by those skilled in the art to convey the substance of their work to others skilled in the art. It will be apparent to those skilled in the art that the embodiments disclosed herein may be practiced with only some of the described aspects. For purposes of explanation, specific numbers, materials, and configurations are set forth to provide a thorough understanding of the illustrative implementations. In some cases, the embodiments disclosed may be practiced without specific details. In other instances, well-known features are omitted or simplified so as not to obscure the illustrated embodiments.
For the purposes of the present disclosure and the appended claims, the article “a” refers to one or more of an item. The phrase “A or B” is intended to encompass the “inclusive or,” e.g., A, B, or (A and B). “A and/or B” means A, B, or (A and B). For the purposes of the present disclosure, the phrase “A, B, and/or C” means A, B, C, (A and B), (A and C), (B and C), or (A, B, and C).
The embodiments disclosed can readily be used as the basis for designing or modifying other processes and structures to carry out the teachings of the present specification. Any equivalent constructions to those disclosed do not depart from the spirit and scope of the present disclosure. Design considerations may result in substitute arrangements, design choices, device possibilities, hardware configurations, software implementations, and equipment options.
As used throughout this specification, a “memory” is expressly intended to include both a volatile memory and a nonvolatile memory. Thus, for example, an “engine” as described above could include instructions encoded within a volatile or nonvolatile memory that, when executed, instruct a processor to perform the operations of any of the methods or procedures disclosed herein. It is expressly intended that this configuration reads on a computing apparatus “sitting on a shelf” in a non-operational state. For example, in this example, the “memory” could include one or more tangible, nontransitory computer-readable storage media that contain stored instructions. These instructions, in conjunction with the hardware platform (including a processor) on which they are stored may constitute a computing apparatus.
In other embodiments, a computing apparatus may also read on an operating device. For example, in this configuration, the “memory” could include a volatile or run-time memory (e.g., RAM), where instructions have already been loaded. These instructions, when fetched by the processor and executed, may provide methods or procedures as described herein.
In yet another embodiment, there may be one or more tangible, nontransitory computer-readable storage media having stored thereon executable instructions that, when executed, cause a hardware platform or other computing system, to carry out a method or procedure. For example, the instructions could be executable object code, including software instructions executable by a processor. The one or more tangible, nontransitory computer-readable storage media could include, by way of illustrative and nonlimiting example, a magnetic media (e.g., hard drive), a flash memory, a ROM, optical media (e.g., CD, DVD, Blu-Ray), nonvolatile random-access memory (NVRAM), nonvolatile memory (NVM) (e.g., Intel 3D Xpoint), or other nontransitory memory.
There are also provided herein certain methods, illustrated for example in flow charts and/or signal flow diagrams. The order or operations disclosed in these methods discloses one illustrative ordering that may be used in some embodiments, but this ordering is not intended to be restrictive, unless expressly stated otherwise. In other embodiments, the operations may be carried out in other logical orders. In general, one operation should be deemed to necessarily precede another only if the first operation provides a result required for the second operation to execute. Furthermore, the sequence of operations itself should be understood to be a nonlimiting example. In appropriate embodiments, some operations may be omitted as unnecessary or undesirable. In the same or in different embodiments, other operations not shown may be included in the method to provide additional results.
In certain embodiments, some of the components illustrated herein may be omitted or consolidated. In a general sense, the arrangements depicted in the FIGURES may be more logical in their representations, whereas a physical architecture may include various permutations, combinations, and/or hybrids of these elements.
With the numerous examples provided herein, interaction may be described in terms of two, three, four, or more electrical components. These descriptions are provided for purposes of clarity and example only. Any of the illustrated components, modules, and elements of the FIGURES may be combined in various configurations, all of which fall within the scope of this specification.
In certain cases, it may be easier to describe one or more functionalities by disclosing only selected elements. Such elements are selected to illustrate specific information to facilitate the description. The inclusion of an element in the FIGURES is not intended to imply that the element must appear in the disclosure, as claimed, and the exclusion of certain elements from the FIGURES is not intended to imply that the element is to be excluded from the disclosure as claimed. Similarly, any methods or flows illustrated herein are provided by way of illustration only. Inclusion or exclusion of operations in such methods or flows should be understood the same as inclusion or exclusion of other elements as described in this paragraph. Where operations are illustrated in a particular order, the order is a nonlimiting example only. Unless expressly specified, the order of operations may be altered to suit a particular embodiment.
Other changes, substitutions, variations, alterations, and modifications will be apparent to those skilled in the art. All such changes, substitutions, variations, alterations, and modifications fall within the scope of this specification.
To aid the United States Patent and Trademark Office (USPTO) and, any readers of any patent or publication flowing from this specification, the Applicant: (a) does not intend any of the appended claims to invoke paragraph (f) of 35 U.S.C. section 112, or its equivalent, as it exists on the date of the filing hereof unless the words “means for” or “steps for” are specifically used in the particular claims; and (b) does not intend, by any statement in the specification, to limit this disclosure in any way that is not otherwise expressly reflected in the appended claims, as originally presented or as amended.
This application claims priority to U.S. Provisional Application No. 63/452,738, titled “Targeted Real-Time Clusters,” filed Mar. 17, 2023, which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63452738 | Mar 2023 | US |