This disclosure relates in general to the field of computer network administration and support and, more particularly, to identifying similar software inventories on selected hosts.
The field of computer network administration and support has become increasingly important and complicated in today's society. Computer network environments are configured for virtually every organization and usually have multiple interconnected computers (e.g., end user computers, laptops, servers, printing devices, etc.). Typically, each computer has its own set of executable software, each of which can be represented by an executable software inventory. For Information Technology (IT) administrators, congruency among executable software inventories of similar computers (e.g., desktops and laptops) simplifies maintenance and control of the network environment. Differences between executable software inventories, however, can arise in even the most tightly controlled network environments. In addition, each organization may develop its own approach to computer network administration and, consequently, some organizations may have very little congruency and may experience undesirable diversity of executable software on their computers. Particularly in very large organizations, executable software inventories may vary greatly among computers across departmental groups. Varied executable software inventories on computers within organizations present numerous difficulties to IT administrators to maintain, to troubleshoot, to service, and to provide uninterrupted access for business or other necessary activities. Innovative tools are needed to assist IT administrators to successfully support computer network environments with computers having incongruities between executable software inventories.
To provide a more complete understanding of the present disclosure and features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying figures, wherein like reference numerals represent like parts, in which:
Overview
A method in one example implementation includes obtaining a plurality of host file inventories corresponding respectively to a plurality of hosts, calculating input data using the plurality of host file inventories, and providing the input data to a clustering procedure to group the plurality of hosts into one or more clusters of hosts. The method further includes each cluster of hosts being grouped using a predetermined similarity criteria. More specific embodiments include each of the plurality of host file inventories including a set of one or more file identifiers with each file identifier representing a different executable software file on a corresponding one of the plurality of hosts. In another more specific embodiment, the method includes each of the one or more file identifiers including a token sequence of one or more tokens. In other more specific embodiments, the calculating the input data includes transforming the plurality of host file inventories into a similarity matrix. In another more specific embodiment, the calculating the input data includes transforming the plurality of host file inventories into a matrix of keyword vectors in Euclidean space, where each keyword vector corresponds to one of the plurality of hosts.
In example embodiments, the system for clustering host inventories may be utilized to provide valuable information to users (e.g., IT administrators, network operators, etc.) identifying computers having similar operating systems and installed executable software files. In one example, when the system for clustering host inventories is applied to a computer network environment such as network environment 100 of
For purposes of illustrating the techniques of the system for clustering host inventories, it is important to understand the activities occurring within a given network. The following foundational information may be viewed as a basis from which the present disclosure may be properly explained. Such information is offered earnestly for purposes of explanation only and, accordingly, should not be construed in any way to limit the broad scope of the present disclosure and its potential applications.
Typical network environments used in organizations and by individuals often include a plurality of computers such as end user desktops, laptops, servers, network appliances, and the like, and each may have an installed set of executable software. In large organizations, network environments may include hundreds or thousands of computers, which may span different buildings, cities, and/or geographical areas around the world. IT administrators may be tasked with the extraordinary responsibility of maintaining these computers in a way that minimizes or eliminates disruption to business activities.
One difficulty IT administrators face includes maintaining multiple computers in a chaotic or heterogeneous network environment. In such an environment, congruency between executable software of the computers may be minimal. For example, executable files may be stored in different memory locations on different computers, different versions of executable files may be installed in different computers, executable files may be stored on some computers but not on others, and the like. Such networks may require additional time and resources to be adequately supported as IT administrators may need to individualize policies, maintenance, upgrades, repairs, and/or any other type of support to suit particular computers having nonstandard executable software and/or operating systems.
Homogenous network environments, in which executable software of computers are congruent or at least similar, may also benefit from a system and method for clustering host inventories. In homogeneous environments or substantially homogeneous environments, particular computers may occasionally deviate from standard computers within the network environment. For example, malicious software may break through the various firewalls and other network barriers creating one or more deviant computers. In addition, end users of computers may install various executable software files from transportable disks or download such software creating deviant computers. In accordance with the present disclosure, a system for clustering host inventories could readily identify any outliers having nonstandard and possibly malicious executable software.
A system and method for clustering host inventories, as outlined in
Note that in this Specification, references to various features (e.g., elements, structures, modules, components, steps, etc.) included in “one embodiment”, “example embodiment”, “an embodiment”, “another embodiment”, “some embodiments”, “various embodiments”, “one example”, “other embodiments”, and the like are intended to mean that any such features may be included in one or more embodiments of the present disclosure, but may or may not necessarily be included in the same embodiments.
Turning to the infrastructure of
In an example embodiment, hosts 110 may represent end user computers that could be operated by end users. The end user computers may include desktops, laptops, and mobile or handheld computers (e.g., personal digital assistants (PDAs) or mobile phones). Hosts 110 can also represent other computers (e.g., servers, appliances, etc.) having executable software, which could be similarly evaluated and clustered by the system, using executable file inventories derived from sets of executable files 112 on such hosts 110. It should be noted that the network configurations and interconnections shown and described herein are for illustrative purposes only.
Sets of executable files 112 on hosts 110 can include all executable files on respective hosts 110. In this Specification, references to “executable file”, “program file”, “executable software file”, and “executable software” are meant to encompass any software file comprising instructions that can be understood and processed by a computer such as executable files, library modules, object files, other executable modules, script files, interpreter files, and the like. In one embodiment, the system could be configured to allow the IT administrator to select a particular type of executable file to be clustered. For example, an IT Administrator may choose only dynamic-link library (DLL) modules for clustering. Thus, sets of executable files 112 would include only DLL modules on the respective hosts 110. In addition, the IT administrator may also be permitted to select particular hosts to which clustering is applied. For example, all end user computers in a network or within a particular part of the network may be selected. In another example, all servers within a network or within a particular part of the network may be selected.
Central server 130 of network environment 100 represents an exemplary server or other computer linked to hosts 110, which may provide services to hosts 110. The system of clustering host inventories may be implemented in central server 130 using various embodiments of host inventory preparation module 150 and clustering module 160. For example, keyword techniques may be used with vector based clustering in one example embodiment. In this example embodiment, host inventory preparation module 150 creates an (n×m) vector matrix where columns of the matrix may correspond to a determined number (i.e., “m”) of unique keywords, each of which is associated with one or more executable files in a selected number (i.e., “n”) of hosts. The rows of the vector matrix may correspond to the n selected hosts. Clustering module 160 can then apply a clustering algorithm to the vector matrix to create logical groupings of the n selected hosts. In another example embodiment, compression techniques may be used with similarity based clustering. In this example embodiment, host inventory preparation module 150 may create an (n×n) similarity matrix using compression techniques for a selected number (i.e., “n”) of hosts. Clustering module 160 may then apply a clustering algorithm to the similarity matrix to create logical groupings of the n selected hosts. In one embodiment, selected hosts may include all of the hosts 110 in a particular network environment such as network environment 100. In other embodiments, selected hosts may include particular hosts selected by a user or predefined by policy, with such hosts existing in one or more network environments.
Management console 170 linked to central server 130 may provide viewable cluster data for the IT administrators or other authorized users. Administrative module 140 may also be incorporated to allow IT administrators or other authorized users to add the logical groupings from a cluster analysis to an enterprise management system and to apply common policies to selected groupings. In addition, deviant or exceptional groupings or outliers can trigger various remedial actions (e.g., emails, vulnerability scans, etc.). In addition, management console 170 may also provide a user interface for the IT Administrator to select particular hosts and/or particular types of executable files to be included in the clustering procedure, in addition to other user provided configuration data for the system. One exemplary enterprise management system that could be used includes McAfee® electronic Policy Orchestrator (ePO) software manufactured by McAfee, Inc. of Santa Clara, Calif.
Turning to
Processor 220, which may also be referred to as a central processing unit (CPU), can include any general or special-purpose processor capable of executing machine readable instructions and performing operations on data as instructed by the machine readable instructions. Main memory 230 may be directly accessible to processor 220 for accessing machine instructions and can be in the form of random access memory (RAM) or any type of dynamic storage (e.g., dynamic random access memory (DRAM)). Secondary storage 240 can be any non-volatile memory such as a hard disk, which is capable of storing electronic data including executable software files. Externally stored electronic data may be provided to computer 200 through removable memory interface 270. Removable memory interface 270 may provide connection to any type of external memory such as compact discs (CDs), digital video discs (DVDs), flash drives, external hard drives, or any other external media.
Network interface 250 can be any network interface controller (NIC) that provides a suitable network connection between computer 200 and any networks to which computer 200 connects for sending and receiving electronic data. For example, network interface 250 could be an Ethernet adapter, a token ring adapter, or a wireless adapter. A user interface 260 may be provided to allow a user to interact with the computer 200 via any suitable means, including a graphical user interface display. In addition, any appropriate input mechanism may also be included such as a keyboard, mouse, voice recognition, touch pad, input screen, etc.
Not shown in
Turning to
In other examples, more complex file identifiers could be selected to provide a higher level of distinctiveness of an executable file. In one such example, a file identifier could include a sequence of first and second tokens having a checksum configuration and file path configuration, respectively, where the file path indicates where the executable file is stored on disk in the particular host in which it is installed. Thus, if identical executable files X and Y are installed on host 110a and host 110b, respectively, but are stored in different locations of memory, then the first token of the file identifier generated for executable file X on host 110a could be the same as the first token of the file identifier generated for executable file Y on host 110b. However, the second token of the file identifier generated for executable file X on host 110a could be different than the second token of the file identifier generated for executable file Y on host 110b.
Numerous other file identifiers may be configured by using any number of tokens and configuring the tokens to include any combination of available program file attributes, checksums, and/or file paths. Program file attributes may include, for example, creation date, modification date, security settings, vendor name, and the like. Although file identifiers may be configured with any number of such tokens, an executable file without a particular program file attribute, which is selected as one of the tokens, may have a file identifier with only the tokens available to that executable file. For example, if the file identifier is configured to include a first token (e.g., a checksum) and a second token (e.g., a vendor name), then an executable file without an embedded vendor name would have a file identifier with only a first token corresponding to the file checksum. In contrast, an executable file having an embedded vendor name would have a file identifier with both first and second tokens corresponding to the file checksum and vendor name, respectively.
The file identifiers and resulting host file inventories I1 through In may be provided by various implementations. In one embodiment, the file identifiers and resulting host file inventories may be generated by host inventory feeds 114 for each host 110 and pushed to central server 130. For embodiments in which a user configures the file identifier by selecting a number of tokens for the token sequence and by selecting individual token configurations, central server 130 may provide the user selected configuration criteria to each host 110. Host inventory feeds 114 may then generate file identifiers with token sequences having the particular user-selected configuration. In another embodiment, checksums for each executable file may be generated on hosts 110 by host inventory feeds 114 and then pushed to central server 130 along with other file attributes and file paths such that host inventory preparation module 150 of central server 130 can generate the file identifiers and resulting host file inventories for each of the selected hosts 110. In one embodiment, enumeration of executable files from the sets of executable files 112 of selected hosts 110 can be achieved by existing security technology such as, for example, Policy Auditor software or Application Control software, both manufactured by McAfee, Inc. of Santa Clara, Calif.
Referring again to
After vector-based clustering has been performed on the vector matrix in step 330, flow moves to step 340 where one or more reports can be generated indicating the clustered groupings determined during the clustering analysis and can be provided to authorized users by various methods (e.g., screen displays, electronic files, hard copies, emails, etc.). Exemplary reports may include a textual report and/or a visual representation (e.g., a proximity plot, a dendrogram, heat maps of a permuted keyword matrix, heat maps of a reduced keyword matrix where rows and columns have been merged to illustrate clusters, other cluster plots, etc.) enabling the user to view logical groupings of the selected hosts. For example, after the clustering analysis has been performed, a graphical user interface of management console 170 may display a proximity plot having physical representations of each host, with identifiable logical groupings (e.g., uniquely colored groupings, circled or otherwise enclosed groupings, representations of groupings with connected lines, etc.). Once the similar groupings and outlier hosts have been identified, an IT Administrator or other authorized user can apply common policies to hosts within the logical groupings and remedial action may be taken on any identified outlier hosts. For example, outlier hosts may be remediated to a standard software configuration as defined by the IT Administrators.
Turning to
The number of keywords associated with an executable file equals the number of tokens in the token sequence of the file identifier representing the executable file. Therefore, one or more keywords can be associated with each executable file in sets of executable files 112 of selected hosts 110. In addition, each keyword could be associated with multiple executable files in the same or different hosts. Thus, a keyword sequence km may be defined as a sequence of unique keywords K1 through Km, where each keyword is associated with one or more executable files in sets of executable files 112 of all selected hosts 110.
Vector matrix format 400 includes n rows 460 and m columns 470, with n and m defining the dimensions of the resulting n-by-m (i.e., n×m) vector matrix. Each row of vector matrix format 400 is denoted by a unique host Hi (i=1 to n), and each column is denoted by a unique keyword Kj (j=1 to m) of keyword sequence km. Each entry 480 is denoted by a variable with subscripts i and j (i.e., ai,j) where i and j correspond to the respective row and column where the entry is located. For example, entry a2,1 is found in row 2, column 1 of vector matrix format 400. Each row of entries represents a row vector 410, 420, and 430 for its corresponding host H1, H2 and Hn. For example, a1,1, a1,2, through a1,m define row vector 410 for host H1. Once each of the entries 480 has been filled with a determined value, row vectors 410, 420, through 430 can be provided as input data to a vector-based clustering algorithm to create a cluster graph or plot showing logical groupings of hosts H1 through Hn, having similar inventories of executable files and any host outliers having dissimilar host inventories.
Turning to
Once keyword sequence km has been determined, the algorithm of flow 500 computes a list of position vectors for an n×m vector matrix. Variables ‘i’ and ‘j’ are used to construct the vector matrix having n×m vector matrix format 400, in m-dimensional keyword space, for each host Hi by iterating over j through km and producing appropriate values for the position vectors indicating whether each host file inventory Ii contains each keyword Kj.
The iterative flow to find keywords of keyword sequence km in file identifiers of host inventories is illustrated in steps 520 through 575 of
The values of entries ai,j in the vector matrix, which indicate whether keyword Kj is found in a host file inventory Ii, may vary depending upon the particular implementation of the system. In one embodiment, an entry ai,j is assigned a ‘1’ value in step 555, indicating keyword Kj was found in host file inventory Ii, or a ‘0’ value in step 560, indicating keyword Kj was not found in host file inventory Ii. Thus, in this embodiment, vector matrix contains only ‘0’ and/or ‘1’ values. In another embodiment, entry ai,j is assigned a value in step 555 or 550 corresponding to a frequency of occurrence of keyword Kj in host file inventory Ii. For example, assume file identifiers in a host file inventory I1 include a first token configured as a checksum and a second token configured as a vendor name, with three executable files on host H1 having the same embedded vendor name, XYZ, resulting in keyword K2 of keyword sequence km being assigned the embedded vendor name XYZ. In this embodiment, when host file inventory I1 of host H1 is searched for keyword K2, entry a1,2 could be updated with a value of 3 because of the three occurrences of vendor name XYZ in file identifiers of host file inventory I1. Thus, in this embodiment, vector matrix may contain ‘0’ values and/or positive integer values.
After row i, column j is filled with an appropriate value in step 555 or 550, flow moves to decision box 560 where a query is made as to whether j<m. If j<m, then host file inventory Ii of host Hi has not been checked for all of the keywords in km. Therefore, flow moves to step 565 where j is set to j+1, and flow loops back to step 540 to get the next keyword Kj (with j=j+1) in km and search for Kj in host file inventory Ii. If, however, in decision box 560 it is determined that j is not less than m (i.e., j≧m), then host file inventory Ii has been searched for all of the keywords K1 through Km in keyword sequence km, so flow moves to decision box 570, which is part of the outer loop of flow 500. A query is made in decision box 570 to determine whether i<n, and if i<n, then not all of the hosts have been evaluated to generate corresponding keyword vectors. Therefore, flow moves to step 575 where i is set to i+1, and flow loops back to step 530. In step 530, j is set to 1 again, so that a vector for the next host Hi (with i=i+1) can be generated by inner loop steps 540 through 565. With reference again to decision box 570, if i is not less than n (i.e., i≧n), then all of the hosts H1 through Hn have been evaluated such that all of the vectors have been created in n×m vector matrix and, therefore, the flow ends.
The embodiment of the flow 500 shown in
The clustering analysis performed on the resulting vector matrix may include commonly available clustering techniques such as agglomerative hierarchical clustering or partitional clustering. In agglomerative hierarchical clustering, each element begins as a separate cluster and elements are merged into successively larger clusters, which may be represented in a tree structure called a dendrogram. A root of the tree represents a single cluster of all of the elements and the leaves of the tree represent separated clusters of the elements. Generally, merging schemes in agglomerative hierarchical clustering used to achieve logical groupings may include schemes well-known in the art such as single-link (i.e., the distance between clusters is equal to the shortest distance from any member of one cluster to any member of another cluster), complete-link (i.e., the distance between clusters is equal to the greatest distance from any member of one cluster to any member of another cluster), group-average (i.e., the distance between clusters is equal to the average distance from any member of one cluster to any member of another cluster), and centroid (i.e., the distance between clusters is equal to the distance from the center of any one cluster to the center of another cluster).
Known techniques may be implemented in which predetermined similarity criteria sets the point at which clustering is halted (e.g., cut point determination). Cut point determination may be made, for example, at a specified level of similarity or when consecutive similarities are the greatest, which is known in the art. As an example, a tree structure representing clusters could be cut at a predetermined height resulting in more or less clusters depending on the selected height at which the cut is made. Cut point determinations may be determined based on a particular network environment or particular hosts being clustered. In one example embodiment, an IT administrator or other authorized user could define the cut point determination used by the clustering procedure by determining a desired threshold for similarity based on the particular network environment.
In other embodiments, partitional clustering may be used. Partitional clustering typically involves an algorithm that determines all clusters at one time. In partitional clustering, predetermined similarity criteria may provide, for example, a selected number of clusters to be generated or a maximum diameter for the clusters. One exemplary software package that implements these various clustering techniques is CLUTO Software for Clustering High-Dimensional Datasets developed by George Karypis, Professor at the Department of Computer Science & Engineering, University of Minnesota, Minneapolis and Saint Paul, Minn., which may be found on the World Wide Web at http://glaros.dtc.umn.edu/gkhome/view/cluto.
Turning to
In the example scenario of applying the keyword method of flow 500 to the sets of executable files 601 through 605 of selected plurality of hosts 600 in order to create vector matrix 700, the following variables may be identified:
Once keyword sequence km is determined, flow moves to step 520 where variable i is set to 1 and then the iterative flow begins to create n×m (5×8) vector matrix 700 shown in
After the last entry 780 of keyword vector 710 has been added to vector matrix 700, flow moves to decision box 560 where a query is made as to whether j<m (i.e., Is 8<8?). Because j is not less than 8, flow moves to decision box 570 where a query is made as to whether i<n (i.e., Is 1<5?). Because 1 is less than 5, flow moves to step 575 where i is set to 2 (i.e., i=i+1) and flow loops back to step 530 where j is set to 1. The inner iterative loop then begins in step 540 to search for all keywords in host file inventory Ii (I2) of host Hi (H2) beginning with keyword Kj (K1). Thus, in the embodiment used in this example scenario, rows 760 are successively filled with a ‘1’ or a ‘0’ value for each entry ai,j until each vector row 710 through 750 has been completed. As previously discussed herein, however, another embodiment provides that each entry ai,j in rows 760 could be filled with a value corresponding to the frequency of occurrence of keyword Kj found in host file inventory Ii.
Vector matrix 700 can be provided as input data to a vector based clustering procedure, as previously described herein. Information generated from the clustering procedure could be provided in numerous ways such as, for example, reports, screen displays, files, emails, etc. In one example, the information could be provided in a proximity plot such as example proximity plot 800 illustrated in
Turning to
After file identifiers and host file inventories have been determined for each of the selected hosts 110, flow then moves to step 920 where a compression technique may be used to transform host file inventories into a similarity matrix, which will be further described herein with reference to
After similarity-based clustering has been performed on the similarity matrix in step 930, flow moves to step 940 where one or more reports can be generated indicating the clustered groupings determined during the clustering analysis, as previously described herein with reference to
Turning to
Similarity matrix format 1000 includes n rows 1060 and n columns 1070, with ‘n’ defining the number of dimensions of the resulting n-by-n (i.e., n×n) similarity matrix. Each row of similarity matrix format 1000 is denoted by host Hi (i=1 to n), and each column is denoted by host Hj (j=1 to n). Each entry 1080 is denoted by a variable with subscripts i and j (i.e., ai,j) where i and j correspond to the respective row and column where the entry is located. For example, entry a2,1 is found in row 2, column 1 of similarity matrix format 1000.
When a similarity matrix is created in accordance with one embodiment of this disclosure, each entry ai,j has a numerical value representing the similarity distance between host Hi and host Hj with 1 representing the highest degree of similarity. In one embodiment, the similarity distances represented by entries a1,1 through an,n can include any numerical value from 0 to 1, inclusively (i.e., 0≦ai,j≦1). In this embodiment, the closer ai,j is to 1, the greater the similarity is between host file inventories Ii and Ij of hosts Hi and Hj, and the closer ai,j is to zero, the greater the difference is between host file inventories Ii and Ij of hosts Hi and Hj. Thus, a value of 1 in ai,j may indicate hosts Hi and Hj have identical host file inventories and therefore, identical sets of executable files, whereas a value of zero in ai,j may indicate hosts H1 and Hj have no common file identifiers in their respective host file inventories and therefore, no common executable files in their respective sets of executable files. Once each of the entries 1080 has been filled with a calculated value, the resulting similarity matrix can be provided as input data into a similarity-based clustering algorithm to create a cluster graph or plot showing logical groupings of hosts H1 through Hn having similar sets of executable files and outlier hosts having dissimilar sets of executable files. The clustering analysis performed on the resulting similarity matrix may include commonly available clustering techniques such as agglomerative hierarchical clustering or partitional clustering, as previously described herein with reference to clustering analysis of a vector matrix.
Turning to
In step 1120, a list of file identifiers (e.g., checksums, checksums combined with a file path, checksums combined with one or more file attributes, etc.) representing a set of executable files on host Hi are extracted from host file inventory Ii and put in a file Fi. In step 1125, a list of file identifiers representing a set of executable files on host Hj are extracted from host file inventory Ij and put in a file Fj. In step 1130, files Fi and Fj are concatenated and put in file Fij. It will be apparent that the use of files Fi, Fj, and Fij to store file identifiers is an example implementation of the system, and that memory buffers or any other suitable representation allowing concatenation, compression, and length determination of data may also be used.
After files Fi, Fj, and Fij are prepared, compression is applied to each of the files. A compression utility such as, for example, gzip, bzip, bzip2, zlib, or zip compression utilities may be used to compress files Fi, Fj, and Fij. Also, in some embodiments, the list of file identifiers in files Fi, Fj, and Fij may be sorted to enable more accurate compression by the compression utility. In step 1140, file Fi is compressed and the length of the result is represented as Ci. In step 1145, file Fj is compressed and the length of the result is represented as Cj. In step 1150, file Fij is compressed and the length of the result is represented as Cij. After compressing each of the files, normalized compression distance (NCDi,j) between Hi and Hj is computed in step 1155.
Normalized compression distance (NCD) is used for clustering and is based on an algorithm developed by Kolmogorov called normalized information distance (NID). NCD is discussed in detail in Rudi Cilibrasi's 2007 thesis entitled “Statistical Interference through Data Compression,” which may be found at http://www.illc.uva.nl/Publications/Dissertation/DS-2007-01.text.pdf and can be used to compute the distance between similar data. NCD may be computed using the following equation:
NCDi,j=[Cij−min{Ci,Cj}]/max{Ci,Cj}
Once NCDi,j has been computed, flow moves to step 1160 where ai,j is computed by the following equation: ai,j=1−NCDi,j. The value ai,j is then used to construct the similarity matrix by adding ai,j to row i, column j. After the similarity matrix has been updated in step 1160, flow moves to decision box 1165 and a query is made as to whether j<n. If j<n, then additional entries in row i of the similarity matrix need to be computed (i.e., similarity distance has not been computed between host Hi and all of the hosts Hj (j=1 to n). In this case, flow moves to step 1170 where j is set to j+1. Flow then loops back to step 1120 where the inner loop of flow 1100 repeats and the similarity distance is computed between host Hi and the next host Hj with j=j+1.
With reference again to decision box 1165, if j is not less than n (i.e., j≧n), then all of the entries in row i have been computed and flow moves to decision box 1175 where a query is made as to whether i<n. If i<n, then not all rows of similarity matrix 1000 have been computed, and therefore, flow moves to step 1180 where i is set to i+1. Flow then loops back to step 1115 where j is set to 1 so that entries ai,j for the next row i (Hi, with i=i+1) can be generated by inner loop steps 1120 through 1170. With reference again to decision box 1175, if i is not less than n (i.e., i≧n) then entries for all of the rows i through n have been computed and, therefore, the similarity matrix has been completed and flow ends.
It will be apparent that flow 1100 could be optimized in numerous ways. One optimization technique includes caching the lengths of compressed files Ci and Cj, which are used multiple times during flow 1100 to calculate entries 1080 in the similarity matrix. In addition, the extracted lists of file identifiers Fi and Fj may also be cached for use during flow 1100. It will also be noted that the matrix should be symmetric along the diagonal a1,1 through an,n. This symmetry could be used in the implementation of the system to compute only one-half of the matrix and then reflect the results over the diagonal.
Turning to 12,
Applying the compression method flow 1100 of
Each of the host file inventories I1 through I5 includes a set of file identifiers representing one of the sets of executable files 601, 602, 603, 604, and 605. Each executable file in a set of executable files is represented by a separate file identifier in the particular host file inventory. In this example scenario in which each file identifier includes a single token having a checksum configuration, the following host file inventories of hosts H1 through H5 could include file identifiers D1 through D8, which represent executable files 610 through 680, respectively:
In step 1110, i is set to 1 and then the iterative looping begins to create an n×n (5×5) similarity matrix 1200 shown in
After the NCD1,1 value is computed in step 1155, flow moves to step 1160 and ai,j is computed:
The ‘1’ value is added to row i, column j (row 1, column 1) of similarity matrix 1200. After similarity matrix 1200 has been updated, flow moves to decision box 1165 where a query is made as to whether j<n. Since 1 is less than 5, flow moves to step 1170 where j is set to 2 (i.e., j=j+1). Flow then loops back to step 1120 to determine the similarity distance between Hi (H1) and the next host Hj (H2). In this case, after extraction and compression are performed, NCDi,j (NCD1,2) is computed as 0.75, because H1 and H2 have only one common file identifier D1 and, therefore, only one common executable file 601. In step 1160, NCD1,2 is used to compute a1,2 as 0.25, which is added to row i, column j (row 1, column 2) of similarity matrix 1200. The variable j is still less than 5, (i.e., 2<5) as determined in decision box 1165, so flow moves to step 1170 and j is set to 3 (i.e., j=j+1). This iterative processing continues for each value of j until j=5, thereby filling in each entry for H1 in row i (row 1) of similarity matrix 1200.
After the last entry of row i (row 1) has been added to similarity matrix 1200, flow moves to decision box 1165 where a query is made as to whether j<n (i.e., Is 5<5?). Because j is not less than 5, flow moves to decision box 1175 where a query is made as to whether i<n (i.e., Is 1<5?). Because 1 is less than 5, flow moves to step 1180 where i is set to 2 (i.e., i=i+1) and flow loops back to step 1115 where j is set to 1. The inner iterative loop then begins in step 1120 to determine the similarity distance between host file inventory Ii (I2) of host Hi (H2) and each host file inventory Ij (I1 through I5). Thus, rows 1160 are successively filled with similarity distance values ai,j until each row has been completed.
After the compression method of flow 1100 has finished processing, similarity matrix 1200 can be provided as input to a similarity-based clustering procedure, as previously described herein with reference to clustering techniques used with a vector matrix. Information generated from the clustering procedure could be provided in numerous ways, as previously described herein with reference to
Software for achieving the operations outlined herein can be provided at various locations (e.g., the corporate IT headquarters, end user computers, distributed servers in the cloud, etc.). In other embodiments, this software could be received or downloaded from a web server (e.g., in the context of purchasing individual end-user licenses for separate networks, devices, servers, etc.) in order to provide this system for clustering host inventories. In one example implementation, this software is resident in one or more computers sought to be protected from a security attack (or protected from unwanted or unauthorized manipulations of data).
In other examples, the software of the system for clustering host inventories in a computer network environment could involve a proprietary element (e.g., as part of a network security solution with McAfee® EPO software, McAfee® Application Control software, etc.), which could be provided in (or be proximate to) these identified elements, or be provided in any other device, server, network appliance, console, firewall, switch, information technology (IT) device, distributed server, etc., or be provided as a complementary solution (e.g., in conjunction with a firewall), or provisioned somewhere in the network.
In certain example implementations, the clustering activities outlined herein may be implemented in software. This could be inclusive of software provided in central server 130 (e.g., via administrative module 140, host inventory preparation module 150 and clustering module 160) and hosts 110 (e.g., via host inventory feed 114). These elements and/or modules can cooperate with each other in order to perform clustering activities as discussed herein. In other embodiments, these features may be provided external to these elements, included in other devices to achieve these intended functionalities, or consolidated in any appropriate manner. For example, some of the processors associated with the various elements may be removed, or otherwise consolidated such that a single processor and a single memory location are responsible for certain activities. In a general sense, the arrangement depicted in
In various embodiments, all of these elements (e.g., hosts 110, central server 130) include software (or reciprocating software) that can coordinate, manage, or otherwise cooperate in order to achieve the clustering operations, as outlined herein. One or all of these elements may include any suitable algorithms, hardware, software, components, modules, interfaces, or objects that facilitate the operations thereof. In the implementation involving software, such a configuration may be inclusive of logic encoded in one or more tangible media (e.g., embedded logic provided in an application specific integrated circuit (ASIC), digital signal processor (DSP) instructions, software (potentially inclusive of object code and source code) to be executed by a processor, or other similar machine, etc.), which may be inclusive of non-transitory media. In some of these instances, one or more memory elements (as shown in
Any of the memory items discussed herein should be construed as being encompassed within the broad term ‘memory element.’ Similarly, any of the potential processing elements, modules, and machines described in this Specification should be construed as being encompassed within the broad term ‘processor.’ Each of the computers, servers, and other devices may also include suitable interfaces for receiving, transmitting, and/or otherwise communicating data or information in a network environment.
Note that with the examples provided herein, interaction may be described in terms of two, three, four, or more network components. However, this has been done for purposes of clarity and example only. It should be appreciated that the system can be consolidated in any suitable manner. Along similar design alternatives, any of the illustrated computers, modules, components, and elements of
It is also important to note that the operations described with reference to the preceding FIGURES illustrate only some of the possible scenarios that may be executed by, or within, the system. Some of these operations may be deleted or removed where appropriate, or these operations may be modified or changed considerably without departing from the scope of the discussed concepts. In addition, the timing of these operations may be altered considerably and still achieve the results taught in this disclosure. The preceding operational flows have been offered for purposes of example and discussion. Substantial flexibility is provided by the clustering system in that any suitable arrangements, chronologies, configurations, and timing mechanisms may be provided without departing from the teachings of the discussed concepts.
This application is a continuation (and claims the benefit of priority under 35 U.S.C. §120) of U.S. patent application Ser. No. 12/880,125, filed Sep. 12, 2010, entitled, “SYSTEM AND METHOD FOR CLUSTERING HOST INVENTORIES,” by inventors Rishi Bhargava et al. The disclosure of the prior application is considered part of (and is incorporated by reference in) the disclosure of this application.
Number | Name | Date | Kind |
---|---|---|---|
4688169 | Joshi | Aug 1987 | A |
4982430 | Frezza et al. | Jan 1991 | A |
5155847 | Kirouac et al. | Oct 1992 | A |
5222134 | Waite et al. | Jun 1993 | A |
5390314 | Swanson | Feb 1995 | A |
5521849 | Adelson et al. | May 1996 | A |
5560008 | Johnson et al. | Sep 1996 | A |
5699513 | Feigen et al. | Dec 1997 | A |
5778226 | Adams et al. | Jul 1998 | A |
5778349 | Okonogi | Jul 1998 | A |
5787427 | Benantar et al. | Jul 1998 | A |
5842017 | Hookway et al. | Nov 1998 | A |
5907709 | Cantey et al. | May 1999 | A |
5907860 | Garibay et al. | May 1999 | A |
5926832 | Wing et al. | Jul 1999 | A |
5974149 | Leppek | Oct 1999 | A |
5987610 | Franczek et al. | Nov 1999 | A |
5987611 | Freund | Nov 1999 | A |
5991881 | Conklin et al. | Nov 1999 | A |
6064815 | Hohensee et al. | May 2000 | A |
6073142 | Geiger et al. | Jun 2000 | A |
6141698 | Krishnan et al. | Oct 2000 | A |
6192401 | Modiri et al. | Feb 2001 | B1 |
6192475 | Wallace | Feb 2001 | B1 |
6256773 | Bowman-Amuah | Jul 2001 | B1 |
6275938 | Bond et al. | Aug 2001 | B1 |
6321267 | Donaldson | Nov 2001 | B1 |
6338149 | Ciccone, Jr. et al. | Jan 2002 | B1 |
6356957 | Sanchez, II et al. | Mar 2002 | B2 |
6393465 | Leeds | May 2002 | B2 |
6442686 | McArdle et al. | Aug 2002 | B1 |
6449040 | Fujita | Sep 2002 | B1 |
6453468 | D'Souza | Sep 2002 | B1 |
6460050 | Pace et al. | Oct 2002 | B1 |
6587877 | Douglis et al. | Jul 2003 | B1 |
6611925 | Spear | Aug 2003 | B1 |
6662219 | Nishanov et al. | Dec 2003 | B1 |
6748534 | Gryaznov et al. | Jun 2004 | B1 |
6769008 | Kumar et al. | Jul 2004 | B1 |
6769115 | Oldman | Jul 2004 | B1 |
6795966 | Lim et al. | Sep 2004 | B1 |
6832227 | Seki et al. | Dec 2004 | B2 |
6834301 | Hanchett | Dec 2004 | B1 |
6847993 | Novaes et al. | Jan 2005 | B1 |
6907600 | Neiger et al. | Jun 2005 | B2 |
6918110 | Hundt et al. | Jul 2005 | B2 |
6930985 | Rathi et al. | Aug 2005 | B1 |
6934755 | Saulpaugh et al. | Aug 2005 | B1 |
6988101 | Ham et al. | Jan 2006 | B2 |
6988124 | Douceur et al. | Jan 2006 | B2 |
7007302 | Jagger et al. | Feb 2006 | B1 |
7010796 | Strom et al. | Mar 2006 | B1 |
7024548 | O'Toole, Jr. | Apr 2006 | B1 |
7039949 | Cartmell et al. | May 2006 | B2 |
7065767 | Kambhammettu et al. | Jun 2006 | B2 |
7069330 | McArdle et al. | Jun 2006 | B1 |
7082456 | Mani-Meitav et al. | Jul 2006 | B2 |
7093239 | van der Made | Aug 2006 | B1 |
7124409 | Davis et al. | Oct 2006 | B2 |
7139916 | Billingsley et al. | Nov 2006 | B2 |
7152148 | Williams et al. | Dec 2006 | B2 |
7159036 | Hinchliffe et al. | Jan 2007 | B2 |
7177267 | Oliver et al. | Feb 2007 | B2 |
7203864 | Goin et al. | Apr 2007 | B2 |
7251655 | Kaler et al. | Jul 2007 | B2 |
7290266 | Gladstone et al. | Oct 2007 | B2 |
7302558 | Campbell et al. | Nov 2007 | B2 |
7330849 | Gerasoulis et al. | Feb 2008 | B2 |
7346781 | Cowie et al. | Mar 2008 | B2 |
7349931 | Horne | Mar 2008 | B2 |
7350204 | Lambert et al. | Mar 2008 | B2 |
7353501 | Tang et al. | Apr 2008 | B2 |
7363022 | Whelan et al. | Apr 2008 | B2 |
7370360 | van der Made | May 2008 | B2 |
7406517 | Hunt et al. | Jul 2008 | B2 |
7441265 | Staamann et al. | Oct 2008 | B2 |
7464408 | Shah et al. | Dec 2008 | B1 |
7506155 | Stewart et al. | Mar 2009 | B1 |
7506170 | Finnegan | Mar 2009 | B2 |
7506364 | Vayman | Mar 2009 | B2 |
7546333 | Alon et al. | Jun 2009 | B2 |
7546594 | McGuire et al. | Jun 2009 | B2 |
7552479 | Conover et al. | Jun 2009 | B1 |
7577995 | Chebolu et al. | Aug 2009 | B2 |
7603552 | Sebes et al. | Oct 2009 | B1 |
7607170 | Chesla | Oct 2009 | B2 |
7657599 | Smith | Feb 2010 | B2 |
7669195 | Qumei | Feb 2010 | B1 |
7685635 | Vega et al. | Mar 2010 | B2 |
7698744 | Fanton et al. | Apr 2010 | B2 |
7703090 | Napier et al. | Apr 2010 | B2 |
7757269 | Roy-Chowdhury et al. | Jul 2010 | B1 |
7765538 | Zweifel et al. | Jul 2010 | B2 |
7783735 | Sebes et al. | Aug 2010 | B1 |
7809704 | Surendran et al. | Oct 2010 | B2 |
7818377 | Whitney et al. | Oct 2010 | B2 |
7823148 | Deshpande et al. | Oct 2010 | B2 |
7836504 | Ray et al. | Nov 2010 | B2 |
7840968 | Sharma et al. | Nov 2010 | B1 |
7849507 | Bloch et al. | Dec 2010 | B1 |
7856661 | Sebes et al. | Dec 2010 | B1 |
7865931 | Stone et al. | Jan 2011 | B1 |
7870387 | Bhargava et al. | Jan 2011 | B1 |
7873955 | Sebes et al. | Jan 2011 | B1 |
7895573 | Bhargava et al. | Feb 2011 | B1 |
7908653 | Brickell et al. | Mar 2011 | B2 |
7937334 | Bonissone et al. | May 2011 | B2 |
7937455 | Saha et al. | May 2011 | B2 |
7966659 | Wilkinson et al. | Jun 2011 | B1 |
7987230 | Sebes et al. | Jul 2011 | B2 |
7996836 | McCorkendale et al. | Aug 2011 | B1 |
8015388 | Rihan et al. | Sep 2011 | B1 |
8015563 | Araujo et al. | Sep 2011 | B2 |
8028340 | Sebes et al. | Sep 2011 | B2 |
8195931 | Sharma et al. | Jun 2012 | B1 |
8234713 | Roy-Chowdhury et al. | Jul 2012 | B2 |
8291497 | Griffin et al. | Oct 2012 | B1 |
8307437 | Sebes et al. | Nov 2012 | B2 |
8321932 | Bhargava et al. | Nov 2012 | B2 |
8332929 | Bhargava et al. | Dec 2012 | B1 |
8341627 | Mohinder | Dec 2012 | B2 |
8352930 | Sebes et al. | Jan 2013 | B1 |
8381284 | Dang et al. | Feb 2013 | B2 |
8495060 | Chang | Jul 2013 | B1 |
8515075 | Saraf et al. | Aug 2013 | B1 |
8539063 | Sharma et al. | Sep 2013 | B1 |
8544003 | Sawhney et al. | Sep 2013 | B1 |
8549003 | Bhargava et al. | Oct 2013 | B1 |
8549546 | Sharma et al. | Oct 2013 | B2 |
8555404 | Sebes et al. | Oct 2013 | B1 |
8561051 | Sebes et al. | Oct 2013 | B2 |
8561082 | Sharma et al. | Oct 2013 | B2 |
8615502 | Saraf et al. | Dec 2013 | B2 |
8701182 | Bhargava et al. | Apr 2014 | B2 |
8701189 | Saraf et al. | Apr 2014 | B2 |
8707422 | Bhargava et al. | Apr 2014 | B2 |
20020056076 | van der Made | May 2002 | A1 |
20020069367 | Tindal et al. | Jun 2002 | A1 |
20020083175 | Afek et al. | Jun 2002 | A1 |
20020099671 | Mastin et al. | Jul 2002 | A1 |
20030014667 | Kolichtchak | Jan 2003 | A1 |
20030023736 | Abkemeier | Jan 2003 | A1 |
20030033510 | Dice | Feb 2003 | A1 |
20030073894 | Chiang et al. | Apr 2003 | A1 |
20030074552 | Olkin et al. | Apr 2003 | A1 |
20030115222 | Oashi et al. | Jun 2003 | A1 |
20030120601 | Ouye et al. | Jun 2003 | A1 |
20030120811 | Hanson et al. | Jun 2003 | A1 |
20030120935 | Teal et al. | Jun 2003 | A1 |
20030145232 | Poletto et al. | Jul 2003 | A1 |
20030163718 | Johnson et al. | Aug 2003 | A1 |
20030167292 | Ross | Sep 2003 | A1 |
20030167399 | Audebert et al. | Sep 2003 | A1 |
20030200332 | Gupta et al. | Oct 2003 | A1 |
20030212902 | van der Made | Nov 2003 | A1 |
20030220944 | Schottland et al. | Nov 2003 | A1 |
20030221190 | Deshpande et al. | Nov 2003 | A1 |
20040003258 | Billingsley et al. | Jan 2004 | A1 |
20040015554 | Wilson | Jan 2004 | A1 |
20040051736 | Daniell | Mar 2004 | A1 |
20040054928 | Hall | Mar 2004 | A1 |
20040143749 | Tajali et al. | Jul 2004 | A1 |
20040167906 | Smith et al. | Aug 2004 | A1 |
20040230963 | Rothman et al. | Nov 2004 | A1 |
20040243678 | Smith et al. | Dec 2004 | A1 |
20040255161 | Cavanaugh | Dec 2004 | A1 |
20050018651 | Yan et al. | Jan 2005 | A1 |
20050086047 | Uchimoto et al. | Apr 2005 | A1 |
20050108516 | Balzer et al. | May 2005 | A1 |
20050108562 | Khazan et al. | May 2005 | A1 |
20050114672 | Duncan et al. | May 2005 | A1 |
20050132346 | Tsantilis | Jun 2005 | A1 |
20050228990 | Kato et al. | Oct 2005 | A1 |
20050235360 | Pearson | Oct 2005 | A1 |
20050257207 | Blumfield et al. | Nov 2005 | A1 |
20050257265 | Cook et al. | Nov 2005 | A1 |
20050260996 | Groenendaal | Nov 2005 | A1 |
20050262558 | Usov | Nov 2005 | A1 |
20050273858 | Zadok et al. | Dec 2005 | A1 |
20050283823 | Okajo et al. | Dec 2005 | A1 |
20050289538 | Black-Ziegelbein et al. | Dec 2005 | A1 |
20060004875 | Baron et al. | Jan 2006 | A1 |
20060015501 | Sanamrad et al. | Jan 2006 | A1 |
20060037016 | Saha et al. | Feb 2006 | A1 |
20060080656 | Cain et al. | Apr 2006 | A1 |
20060085785 | Garrett | Apr 2006 | A1 |
20060101277 | Meenan et al. | May 2006 | A1 |
20060133223 | Nakamura et al. | Jun 2006 | A1 |
20060136910 | Brickell et al. | Jun 2006 | A1 |
20060136911 | Robinson et al. | Jun 2006 | A1 |
20060195906 | Jin et al. | Aug 2006 | A1 |
20060200863 | Ray et al. | Sep 2006 | A1 |
20060230314 | Sanjar et al. | Oct 2006 | A1 |
20060236398 | Trakic et al. | Oct 2006 | A1 |
20060259734 | Sheu et al. | Nov 2006 | A1 |
20070011746 | Malpani et al. | Jan 2007 | A1 |
20070028303 | Brennan | Feb 2007 | A1 |
20070039049 | Kupferman et al. | Feb 2007 | A1 |
20070050579 | Hall et al. | Mar 2007 | A1 |
20070050764 | Traut | Mar 2007 | A1 |
20070074199 | Schoenberg | Mar 2007 | A1 |
20070083522 | Nord et al. | Apr 2007 | A1 |
20070101435 | Konanka et al. | May 2007 | A1 |
20070136579 | Levy et al. | Jun 2007 | A1 |
20070143851 | Nicodemus et al. | Jun 2007 | A1 |
20070169079 | Keller et al. | Jul 2007 | A1 |
20070192329 | Croft et al. | Aug 2007 | A1 |
20070220061 | Tirosh et al. | Sep 2007 | A1 |
20070220507 | Back et al. | Sep 2007 | A1 |
20070253430 | Minami et al. | Nov 2007 | A1 |
20070256138 | Gadea et al. | Nov 2007 | A1 |
20070271561 | Winner et al. | Nov 2007 | A1 |
20070300215 | Bardsley | Dec 2007 | A1 |
20080005737 | Saha et al. | Jan 2008 | A1 |
20080005798 | Ross | Jan 2008 | A1 |
20080010304 | Vempala et al. | Jan 2008 | A1 |
20080022384 | Yee et al. | Jan 2008 | A1 |
20080034416 | Kumar et al. | Feb 2008 | A1 |
20080052468 | Speirs et al. | Feb 2008 | A1 |
20080082977 | Araujo et al. | Apr 2008 | A1 |
20080120499 | Zimmer et al. | May 2008 | A1 |
20080141371 | Bradicich et al. | Jun 2008 | A1 |
20080163207 | Reumann et al. | Jul 2008 | A1 |
20080163210 | Bowman et al. | Jul 2008 | A1 |
20080165952 | Smith et al. | Jul 2008 | A1 |
20080184373 | Traut et al. | Jul 2008 | A1 |
20080235534 | Schunter et al. | Sep 2008 | A1 |
20080294703 | Craft et al. | Nov 2008 | A1 |
20080301770 | Kinder | Dec 2008 | A1 |
20090007100 | Field et al. | Jan 2009 | A1 |
20090038017 | Durham et al. | Feb 2009 | A1 |
20090043993 | Ford et al. | Feb 2009 | A1 |
20090055693 | Budko et al. | Feb 2009 | A1 |
20090113110 | Chen et al. | Apr 2009 | A1 |
20090144300 | Chatley et al. | Jun 2009 | A1 |
20090150639 | Ohata | Jun 2009 | A1 |
20090249053 | Zimmer et al. | Oct 2009 | A1 |
20090249438 | Litvin et al. | Oct 2009 | A1 |
20100071035 | Budko et al. | Mar 2010 | A1 |
20100077479 | Viljoen | Mar 2010 | A1 |
20100114825 | Siddegowda | May 2010 | A1 |
20100250895 | Adams et al. | Sep 2010 | A1 |
20100281133 | Brendel | Nov 2010 | A1 |
20100332910 | Ali et al. | Dec 2010 | A1 |
20110029772 | Fanton et al. | Feb 2011 | A1 |
20110035423 | Kobayashi et al. | Feb 2011 | A1 |
20110047543 | Mohinder | Feb 2011 | A1 |
20110078550 | Nabutovsky | Mar 2011 | A1 |
20110099634 | Conrad et al. | Apr 2011 | A1 |
20110113467 | Agarwal et al. | May 2011 | A1 |
20110138461 | Bhargava et al. | Jun 2011 | A1 |
20120030731 | Bhargava et al. | Feb 2012 | A1 |
20120030750 | Bhargava et al. | Feb 2012 | A1 |
20120278853 | Chowdhury et al. | Nov 2012 | A1 |
20120290828 | Bhargava et al. | Nov 2012 | A1 |
20130024934 | Sebes et al. | Jan 2013 | A1 |
20130031111 | Jyoti et al. | Jan 2013 | A1 |
20130091318 | Bhattacharjee et al. | Apr 2013 | A1 |
20130097355 | Dang et al. | Apr 2013 | A1 |
20130097356 | Dang et al. | Apr 2013 | A1 |
20130117823 | Dang et al. | May 2013 | A1 |
20130246423 | Bhargava et al. | Sep 2013 | A1 |
20130247016 | Sharma et al. | Sep 2013 | A1 |
20130247027 | Shah et al. | Sep 2013 | A1 |
20130247032 | Bhargava et al. | Sep 2013 | A1 |
20130247192 | Krasser | Sep 2013 | A1 |
20130326620 | Merza et al. | Dec 2013 | A1 |
20140006405 | Bhargava et al. | Jan 2014 | A1 |
Number | Date | Country |
---|---|---|
1 482 394 | Dec 2004 | EP |
2 037 657 | Mar 2009 | EP |
WO 9844404 | Oct 1998 | WO |
WO 0184285 | Nov 2001 | WO |
WO 2006012197 | Feb 2006 | WO |
WO 2006124832 | Nov 2006 | WO |
WO 2008054997 | May 2008 | WO |
WO 2011059877 | May 2011 | WO |
WO 2012015485 | Feb 2012 | WO |
WO 2012015489 | Feb 2012 | WO |
Entry |
---|
Bjornar Larsen et al., Fast and effective text mining using linear-time document clustering , 1999, ACM, 16-22. |
Notification of International Preliminary Report on Patentability and Written Opinion mailed May 24, 2012 for International Application No. PCT/US2010/055520, 5 pages. |
Sailer et al., sHype: Secure Hypervisor Approach to Trusted Virtualized Systems, IBM research Report, Feb. 2, 2005, 13 pages. |
Kurt Gutzmann, “Access Control and Session Management in the HTTP Environment,” Jan./Feb. 2001, pp. 26-35, IEEE Internet Computing. |
Eli M. Dow, et al., “The Xen Hypervisor,” INFORMIT, dated Apr. 10, 2008, http://www.informit.com/articles/printerfriendly.aspx?p=1187966, printed Aug. 11, 2009 (13 pages). |
“Xen Architecture Overview,” Xen, dated Feb. 13, 2008, Version 1.2, http://wiki.xensource.com/xenwiki/XenArchitecture?action=AttachFile&do=get&target=Xen+architecture—Q1+2008.pdf, printed Aug. 18, 2009 (9 pages). |
Desktop Management and Control, Website: http://www.vmware.com/solutions/desktop/, printed Oct. 12, 2009, 1 page. |
Secure Mobile Computing, Website: http://www.vmware.com/solutions/desktop/mobile.html, printed Oct. 12, 2009, 2 pages. |
Cilibrasi, Rudi Langston, “Statistical Inference Through Data Compression,” Institute for Logic, Language and Computation, ISBN: 90-6196-540-3, Copyright 2007, retrieved Sep. 10, 2010 from http://www.illc.uva.nl/Publications/Dissertations/DS-2007-01.text.pdf, 225 pages. |
Karypis, George, Contact/METIS/CLUTO/MONSTER/YASSPP/Forums, Internal Lab Website, copyright 2006-2010, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome, 1 page. |
Tagarelli, et al., “A Segment-based Approach to Clustering Multi-Topic Documents,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/cluto/publications, 1 page. |
Ying Zhao and George Karypis, “Hierarchical Clustering Algorithms for Document Datasets,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/cluto/publications, 1 page. |
Ying Zhao and George Karypis, “Topic-Driven Clustering for Document Datasets,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/cluto/publications, 1 page. |
Ying Zhao and George Karypis, “Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/cluto/publications, 1 page. |
Ying Zhao and George Karypis, “Clustering in Life Sciences,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/cluto/publications, 1 page. |
Ying Zhao and George Karypis, “Evaluation of Hierarchical Clustering Algorithms for Document Datasets,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/cluto/publications, 1 page. |
Ying Zhao and George Karypis, “Criterion Fuctions for Document Clustering: Experiments and Analysis,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/cluto/publications, 1 page. |
Steinbach, et al., “A Comparison of Document Clustering Techniques,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/cluto/publications, 1 page. |
Karypis, et al., “CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modelings,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/cluto/publications, 1 page. |
Matt Rasmussen and George Karypis, “gCLUTO: An Interactive Clustering, Visualitzation, and Analysis System,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/gcluto/publications, 1 page. |
Matthew Rasmussen, et al., “wCLUTO: A Web-enabled Clustering Toolkit,” copyright 2005-2010, George Karypis, Internal Lab Website, retrieved Sep. 10, 2010 from http://glaros.dtc.umn.edu/gkhome/cluto/wcluto/publications, 1 page. |
Dommers, Calculating the normalized compression distance between two strings, Jan. 20, 2009, retrieved Sep. 10, 2010 from http://www.c-sharpcorner.com/UploadFile/acinonyx72/NCD01202009071004AM/NCD.aspx, 5 pages. |
A Tutorial on Clustering Algorithms, retrieved Sep. 10, 2010 from http://home.dei.polimi.it/matteucc/lustering/tutorial.html, 6 pages. |
Barrantes et al., “Randomized Instruction Set Emulation to Dispurt Binary Code Injection Attacks,” Oct. 27-31, 2003, ACM, pp. 281-289. |
Gaurav et al., “Countering Code-Injection Attacks with Instruction-Set Randomization,” Oct. 27-31, 2003, ACM, pp. 272-280. |
Check Point Software Technologies Ltd.: “ZoneAlarm Security Software User Guide Version 9”, Aug. 24, 2009, XP002634548, 259 pages, retrieved from Internet: URL:http://download.zonealarm.com/bin/media/pdf/zaclient91—user—manual.pdf. |
Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority (1 page), International Search Report (4 pages), and Written Opinion (3 pages), mailed Mar. 2, 2011, International Application No. PCT/US2010/055520. |
Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, or the Declaration (1 page), International Search Report (6 pages), and Written Opinion of the International Searching Authority (10 pages) for International Application No. PCT/US2011/020677 mailed Jul. 22, 2011. |
Notification of Transmittal of the International Search Report and Written Opinion of the International Searching Authority, or the Declaration (1 page), International Search Report (3 pages), and Written Opinion of the International Search Authority (6 pages) for International Application No. PCT/US2011/024869 mailed Ju. 14, 2011. |
Tal Garfinkel, et al., “Terra: A Virtual Machine-Based Paltform for Trusted Computing,” XP-002340992, SOSP'03, Oct. 19-22, 2003, 14 pages. |
IA-32 Intel® Architecture Software Developer's Manual, vol. 3B; Jun. 2006; pp. 13, 15, 22 and 145-146. |
Mung-Sup Kim et al., “A load cluster management system using SNMP and web”, [Online], May 2002, pp. 367-378, [Retrieved from Internet on Oct. 24, 2012], <http://onlinelibrary.wiley.com/doi/10.1002/nem.453/pdf>. |
G. Pruett et al., “BladeCenter systems management software”, [Online], Nov. 2005, pp. 963-975, [Retrieved from Internet on Oct. 24, 2012], <http://citeseerx.ist.pus.edu/viewdoc/download?doi=10.1.1.91.5091&rep=rep1&type=pdf>. |
Philip M. Papadopoulos et al., “NPACI Rocks: tools and techniques for easily deploying manageable Linux clusters” [Online], Aug. 2002, pp. 707-725, [Retrieved from internet on Oct. 24, 2012], <http://onlinelibrary.wiley.com/doi/10.1002/cpe.722/pdf>. |
Thomas Staub et al., “Secure Remote Management and Software Distribution for Wireless Mesh Networks”, [Online], Sep. 2007, pp. 1-8, [Retrieved from Internet on Oct. 24, 2012], <http://cds.unibe.ch/research/pub—files/B07.pdf>. |
Taskar et al., Probabilistic Classification and Clustering in Relational Data, 2001, Google, 7 pages. |
USPTO May 24, 2013 Notice of Allowance from U.S. Appl. No. 12/880,125. |
International Preliminary Report on Patentability received from the PCT Application No. PCT/US2011/020677, mailed on Feb. 7, 2013, 9 pages. |
International Preliminary Report on Patentability received for the PCT Application No. PCT/US2011/024869, mailed on Feb. 7, 2013, 6 pages. |
Office Action received for the U.S. Appl. No. 12/880,125, mailed on Jul. 5, 2012, 12 pages. |
Ex Parte Quayle Action received for the U.S. Appl. No. 12/880,125, mailed on Dec. 21, 2012, 4 pages. |
USPTO Mar. 28, 2014 Nonfinal Rejection in U.S. Appl. No. 13/012,138, 21 pages. |
Number | Date | Country | |
---|---|---|---|
20140006405 A1 | Jan 2014 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12880125 | Sep 2010 | US |
Child | 14016497 | US |