The disclosure generally relates to CPC class G06F and subclass 21/50 and/or 21/56.
Categorizations and classifications of code flaws are used to efficiently triage and handle flaws across codebases. Such categorizations include common weakness enumeration (CWE) and Common Vulnerability Scoring System (CVSS), which associates flaws with descriptions that codify the exposure of flaws and potential triage. These categorizations help organizations determine the relative severity of code flaws by providing context such as likelihood of exploitation and impact of breach. CWE and other flaw categorizations facilitate automation of flaw detection and triage.
Embodiments of the disclosure may be better understood by referencing the accompanying drawings.
The description that follows includes example systems, methods, techniques, and program flows to aid in understanding the disclosure and not to limit claim scope. Well-known instruction instances, protocols, structures, and techniques have not been shown in detail for conciseness.
Proliferation of flaws in software code leads to thousands or millions of potential security vulnerabilities that are not fixed or triaged due to lack of resources for investigating each flaw. Triage is a key tool in dealing with high volume of flaws-a significant percentage of flaws can be triaged as low-priority without requiring additional inspection. However, common categorizations such as CWE and CVSS inflate severity scoring for flaws that are potentially low risk or false positives, and do not differentiate between context for flaws of same categories that can result in reduced risk and differing triage decisions. Automation of recommendations for flaw triage decisions allows for increased efficiency in determining the methods of flaw triage that an organization may pursue, such as changing code or documenting mitigating factors. Moreover, flaws often share similarity to other flaws which can suggest similar or equivalent triage decisions to those already performed on those similar flaws, such as when duplicate code is used multiple times across an organizational codebase. A naïve Bayes classifier disclosed herein is trained to determine high likelihood triage decisions for flaws. The naïve Bayes classifier is trained on data for flaws across organizations and, when specified by an organizational preference, can be trained only on flaw data for the organization or a proportion of flaw data for the organization vs flaw data for other organizations. Inputs to the naïve Bayes classifier are feature vectors comprising count vectorizations of tokens corresponding to features representative of flaws, including flaw identifiers, CWE numbers, method names, line numbers, file extensions, etc.
To supplement the high likelihood triage decisions determined by the naïve Bayes classifier, a flaw similarity model identifies similar flaws from the same organization and generates a list of recommended similar flaws. The recommended flaws are presented to a user at the organization in a dashboard along with associated data paths, flaw status, and any prior triage decisions.
Use of the naïve Bayes classifier leads to interpretable results such as frequencies of each triage decision for particular CWEs, file types, etc. This interpretability allows users to triage flaws according to their metadata which further increases efficiency of flaw triage. The proposed models and user presentation decrease manual resources needed to investigate flaws and automate triage and priority-classification of flaws under appropriate conditions.
Use of the phrase “at least one of” preceding a list with the conjunction “and” should not be treated as an exclusive list and should not be construed as a list of categories with one item from each category, unless specifically stated otherwise. A clause that recites “at least one of A, B, and C” can be infringed with only one of the listed items, multiple of the listed items, and one or more of the items in the list and another item not listed.
The phrase “triage decision” refers to an action applied to a codebase that reduces or otherwise triages risk associated with a flaw in the codebase. Triage decisions, in addition to actions that triage risk by altering code to fix flaws, also include actions that ignore flaws or reduce flaw risk through configuration changes. Triage decisions comprise actions that modify code design, propose mitigating configuration changes to the environment (e.g., a network or an operating system), document existing mitigating factors, designate the flaw risk as acceptable, identify the flaws as false positives, fix flaws, report flaws to appropriate entities, etc.
At stage A, a flaw feature generator 101 receives flaw data 102 and generates feature vector 104 from the flaw data 102. The flaw data 102 comprise data for a flaw detected in the codebase of an organization. For instance, the flaw can be detected using static application security testing (SAST) or dynamic application security testing (DAST). Example flaw data 100 comprise the following:
Metadata fields included in the example flaw data 100 include flaw identifiers, application identifiers, CWE identifiers, filenames, source code, and indicators of whether each flaw was triaged. Additional metadata fields such as line numbers of flaw occurrence in source code can be used.
The flaw feature generator 101 extracts tokens from the flaw data 102 to generate the feature vector 104. For the example flaw data 100, the flaw feature generator 101 generates the following example feature vector 106:
At stage B, a natural language processor 103 receives the feature vector 104 and converts the feature vector 104 to a numerical feature vector 108. For instance, the natural language processor 103 can generate count vectorizations of the feature vector 104 for each flaw as (typically sparse) vectors of 0/1 entries indicating whether the feature value for that entry is present in the feature vector 104. The count vectorizations are generated during training to include entries for feature values that occur in feature vectors of the training data. Alternatively, the natural language processor 103 can use other preprocessing techniques that preserve semantic similarity such as the word2vec algorithm or term frequency-inverse document frequency (tf-idf) statistics. The natural language processor 103 communicates the numerical feature vector 108 to the naïve Bayes classifier 105 and the flaw similarity model 107.
At stage C, the naïve Bayes classifier 105 generates suggested triage decisions 118 based on the numerical feature vector 108 for the detected flaw. The suggested triage decisions 118 comprise likelihoods of performing one of a list of actions such as accepting flaw risk, mitigating by design, potential false positives, mitigating by OS environment, mitigating by network environment, and fixing the flaw. The suggested triage decisions 118 can further indicate frequencies of previous triage decisions per-feature value in the feature vector 104 (as indicated by “1” entries in the numerical feature vector 108). Example frequencies of triage decisions 124 comprise the following:
The example frequencies of triage decisions 124 indicate that for application App1, accepting risk was performed for 3800 flaws, mitigation by design was performed for 4100 flaws, and remediating/fixing was performed for 34000 flaws; for CWE 978, accepting risk was performed for 48000 flaws, mitigating by design was performed for 1200 flaws, and remediating/fixing was performed for 200 flaws; and for method authSession, accepting risk was performed for 16700 flaws, mitigating by design was performed for 12900 flaws, and remediating/fixing was performed for 12300 flaws. Consequently, CWE 978 is generally benign and is a strongly suggests accepting risk for a flaw, App1 strongly suggests remediating/fixing a flaw, and method authSession has less certainty as an indicator for either accepting risk, mitigating by design, or remediating/fixing a flaw. The naïve Bayes classifier 105 communicates the suggested triage decisions 118 to the user interface 120.
In some embodiments, when the likelihood of performing a triage action based on the suggested triage decisions 118 based on frequencies such as example frequencies of triage decisions 124 is sufficiently high (e.g., above a threshold selected by the security team), or the detected flaw satisfies other automation criteria, the naïve Bayes classifier 105 or other cybersecurity component performs an action corresponding to the highest likelihood triage decision. The automation criteria can comprise criteria that the CWE identifier is in a list of certain CWE identifiers (e.g., based on predetermined risk assessments of CWE identifiers), that all but one likelihood value for triage decisions are sufficiently low, that the method is in a list of certain methods, etc. The automation criteria may also be exclusionary, preventing the automation of certain flaw types and forcing manual review. A user at the user interface 120 can determine the automation criteria for the organization based on likelihoods of performing triage actions for certain feature values indicated in suggested triage decisions 118 as flaws are detected.
At stage D, the flaw similarity model 107 receives the numerical feature vector 108 and generates similar flaw recommendations 128. The flaw similarity model 107 communicates the numerical feature vector 108 to a flaw feature vector database (database) 110 and the database 110 returns candidate feature vectors 114. The database 110 applies filtering criteria to identify candidate flaws having corresponding feature vectors to include in the candidate feature vectors 114. The criteria can include that flaws have the same organization, that flaws were triaged after a designated time period prior to the present, that flaws have the same CWE identifier, that flaws have a same flaw type such as “credential management” or “cross-site scripting”, that flaws are well documented, that flaws did not correspond to particular triage decisions (e.g., potential false positives), etc. In some embodiments, when computational resources are sufficiently available or the number of total flaws is sufficiently low, the database 110 returns feature vectors for every stored flaw.
The flaw similarity model 107 determines the distance between the numerical feature vector 108 and the candidate feature vectors 114. For instance, when the numerical feature vector 108 and candidate feature vectors 114 are count vectorizations, the flaw similarity model 107 determines the Manhattan distance between count vectorizations. Other distances such as Euclidean distance, cosine similarity, dot products, etc. can be used. The flaw similarity model 107 determines the top N (e.g., N=3) closest candidate flaws that are below a threshold distance to the detected flaw corresponding to the numerical feature vector 108 to include in the similar flaw recommendations 128. If there are no candidate flaws below the threshold distance, then the flaw similarity model 107 adds indications that there are no recommended flaws to the similar flaw recommendations 128.
Example similar flaw recommendations 126 comprise the following:
The “ID” field comprises an identifier of each similar flaw. The “data path” and “status” fields comprise hyperlinks to pages that describe each data path and prior triage decisions, respectively. The data paths comprise stack traces of function calls that led to each corresponding flaw, for instance as determined by SAST, and indicate lines of code for potential fixing/remediation. The “status” field indicates a length of time since the flaw was first detected. The “severity” field indicates a severity of the flaw. The “documentation” field comprises a hyperlink to triage documentation of the flaw such as prior flaw triage decisions, testing methods to verify environmental mitigation, supervising engineers, time periods for revisiting flaw triage decisions, compensating controls in effect for when flaws are mitigated, etc. The user interface 120 displays results including suggested triage decisions 118 and similar flaw recommendations 128 in a dashboard that allows for sorting by various metadata fields such as CWE identifiers, methods, etc.
At stage A, a flaw triage model trainer (trainer) 201 queries the database 110 with a training data query 200 for flaws in the database 110. The training data query 200 can specify filtering parameters for flaws from which to return training data, such as a prior time period during which flaws were detected (e.g., the past month), types of triage decisions performed, etc. The database 110 then applies any filters (if provided) to flaws according to its database structure and returns training data 202 for a base naïve Bayes classifier 203. The training data 202 comprise data for each flaw, which may include flaw identifiers, application identifiers, CWE identifiers, filenames, source code of the flaws, data paths of the flaws, triage decisions for the flaws, etc. The trainer 201 parses the training data 202 to extract feature vectors and generates count vectorizations of tokens in the feature vectors for each flaw. This allows for frequency analysis for fitting the base naïve Bayes classifier 203.
At stage B, the trainer 201 determines Bayesian model parameters 204 as frequencies of triage decisions for each value of each feature indicated in the count vectorizations. For instance, for a value “798” of the CWE identifier feature, the frequencies could be that the triage decision of “accept by risk” occurred 48000/(48000+1200)˜ 98% of the time and the triage decision of “mitigated by design” occurred 1200/(48000+1200)˜2% of the time. The count vectorizations can be modified for multiple (>2) classes (i.e., multiple triage decisions) by having an entry indicating the class to which a corresponding flaw belongs, and the frequencies comprise frequencies for multiple types of triage decisions on a same feature value. The base naïve Bayes classifier 203 determines frequencies of each frequency value in the training data 202 across the triage decisions. To determine a likelihood of performing a triage decision for a flaw with a count vectorization of feature values, the base naïve Bayes classifier uses the multinomial naïve Bayes formula:
In the above formula, x=(x1, . . . , xn) is a count vectorization for the flaw, (Pk1, . . . , Pkn) is a vector of frequencies of each entry in count vectorizations for the kth triage decision, and p(k) is the likelihood of performing the kth triage decision. Once these frequencies are determined for each feature value across the types of triage decisions, the trainer 201 communicates the base naïve Bayes classifier 203 to a flaw triage model database 212.
At stage C, an entity or individual within the organization 220 communicates organization training data 208 to the database 110. The organization training data 208 comprises data aggregated from flaws detected in a codebase for software of the organization, for instance using SAST or DAST. The organization training data 208 can comprise source code of flaws, identifiers of flaws, filenames for the source code, metadata such as CWE identifiers for the flaws, triage decisions performed for the flaws, data paths for the flaws, etc. Upon receipt of the organization training data 208 or prompted by a request by an entity or individual of the organization 220 to generate a classifier for triage decision recommendation, the database 110 communicates the organization training data 208 to the trainer 201. The organization 220 can filter the organization training data 208 according to a preference for how much of the organization's vs other organization's data to use when training an organization-specific model. Alternatively, the organization 220 can communicate all of its data and the database 110 can filter the organization training data 208 according to preferences selected by the organizations 220,
At stage D, the trainer 201 generates updated Bayesian model parameters 210 for an organization-specific naïve Bayes classifier 207. The trainer 201 extracts count vectorizations from the organization training data 208 with natural language processing. The trainer 201 generates updated Bayesian model parameters 210 according to preferences selected by the organization 220 with respect to a percentage of training data from the organization vs other organizations. For instance, the preferences can specify using the base naïve Bayes classifier 203 as the organization-specific naïve Bayes classifier 207 or can specify retraining the organization-specific naïve Bayes classifier 207 with only the organization training data 208. In some embodiments, the base naïve Bayes classifier 203 can be updated by duplicating data in the organization training data 208 or using a percentage of the organization training data 208 specified by the organization 220. Training can occur online as data from the organization 220 and other organizations is received or can occur offline on a fixed schedule (e.g., weekly, daily, etc.). Moreover, updated versions of the model can be deployed at the organization 220 (or in the cloud) online as additional training data is received or offline according to a fixed schedule or when prompted by the organization 220. The trainer 201 communicates the organization-specific naïve Bayes classifier 207 with the updated Bayesian model parameters 210 to the flaw triage model database 212.
In some embodiments, the trainer 201 can filter flaws in the training data 202 and organization training data 208 prior to generating parameters of the corresponding naïve Bayes classifiers. The trainer 201 can filter flaws with data that the organization 220 deems as low quality for determining triage decisions, for instance flaws that are not well documented, that occurred prior to a specified time period (e.g., a year), etc.
At block 402, the system generates a feature vector from the flaw metadata. The feature vector comprises a count vectorization of feature values for the flaw metadata. The count vectorization is previously generated from feature vectors for training data to comprise a vector of length equal to the number of unique feature values from flaws in the training data. In some embodiments, the total number of feature values can be capped as the N most common feature values to shorten the length of feature vectors for flaws. Feature values may comprise a CWE identifier, a file name for the source code of the flaw, a file extension, a method, a file line for the flaw, etc. For instance, the system can use natural language processing to identify relevant tokens in the flaw data specific to each feature based on metadata of the features. The flaw data can indicate line numbers within the file for the source code where the flaw occurs to facilitate feature extraction.
At block 406, the system determines likelihoods of performing triage decisions for the detected flaw with a machine learning model based on the feature vector. For instance, when the machine learning model is a naïve Bayes classifier, the likelihoods are determined with the feature vector using the multinomial Bayes formula on frequencies of feature values for previously detected flaws.
At block 408, the system determines whether the flaw satisfies automation criteria. The automation criteria comprise criteria that determine whether to perform the highest likelihood triage decision and bypass presenting triage decision recommendations to a user. For instance, the automation criteria can comprise that the feature vector for the flaw comprises feature values such as low severity CWE identifiers, low severity methods, combinations thereof, etc. The automation criteria can comprise that a highest likelihood triage decision determined by the organization-specific naïve Bayes classifier is above a threshold likelihood. If the automation criteria are satisfied, flow proceeds to block 410. Otherwise, flow proceeds to block 412.
At block 410, the system performs the action for the highest likelihood triage decision for the detected flaw and communicates to the organization that the action for the highest likelihood triage decision for the detected flaw was performed. Implementation details for performance of each triage decision can vary per-organization. Different organizations can implement different triage decisions and can implement triage decisions in different ways. The system can further communicate an indication to a user of the organization (e.g., via a dashboard) flaw data for the triaged flaw.
At block 412, the system indicates the highest likelihood triage decision(s) for the detected flaw to a user of the organization. The system can indicate the highest likelihood triage decision(s) in a dashboard of a user interface. The dashboard can further allow the user to sort flaws by metadata, triage decisions performed, etc. Sorted flaws can indicate frequencies of triage decisions performed for previously detected flaws.
At block 414, the system identifies similar flaws to the detected flaw and corresponding recommended triage decisions. The system determines the similar flaws according to Manhattan distance between count vectorizations of previously detected flaws and the count vectorization of the detected flaw. The operations at block 414 are depicted in greater detail in reference to
At block 502, the system begins iterating through candidate flaws. For each iteration, the system retrieves a count vectorization of the candidate flaw. For instance, the system can store count vectorizations in local memory and can, at each iteration, retrieve an additional count vectorization from local memory.
At block 504, the system determines the Manhattan distance from the count vectorization of the candidate flaw to the count vectorization of the detected flaw. Alternatively, different distances can be determined. For instance, for different model inputs than count vectorizations, distances such as cosine similarity, Euclidean distance, and dot products can be determined.
At block 506, the system determines whether the Manhattan distance is below a threshold distance. The threshold distance can be determined based on inspection of similar flaw results for varying thresholds (e.g., by a domain-level expert) using previously detected flaws and corresponding triage decisions. If the Manhattan distance is below the threshold, flow proceeds to block 510 and the system keeps the candidate flaw. Otherwise, flow proceeds to block 510 and the system discards the candidate flaw.
At block 510, the system continues iterating through candidate flaws. If there is an additional candidate flaw, flow returns to block 502. Otherwise, flow proceeds to block 512.
At block 512, the system determines whether there are any remaining candidate flaws within the threshold Manhattan distance. If there are remaining candidate flaws, flow proceeds to block 514. Otherwise, the system indicates to the organization that there are no recommended similar flaws to the detected flaw and the flow in
At block 514, the system recommends the top N candidate flaws with closest Manhattan distance of count vectorizations and corresponding triage decisions previously performed to the organization. The system can further communicate descriptions of the recommended flaws/triage decisions and hyperlinks to data in the descriptions such as data paths, source code, etc.
At block 601, the trainer determines whether training criteria are satisfied. The training criteria can be whether a trained machine learning model has been previously deployed for an organization, whether a sufficient amount of training data has been collected, whether a time period has elapse since prior model training, etc. If the training criteria are satisfied, flow proceeds to block 602. Otherwise, flow returns to block 600 to continue collecting training data.
At block 602, the trainer updates and/or trains a base machine learning model to predict flaw triage decisions with the training data. The trainer generates feature vectors for each flaw in the training data based on architecture of the base machine learning model. For instance, when the base machine learning model is a naïve Bayes classifier, the feature vectors comprise count vectorizations of tokens in feature values for each flaw. The count vectorizations can be truncated to include a specified number of highest frequency feature values (e.g., 1000) to improve efficiency/storage. Other types of feature vectors can be generated with other natural language processing methods depending on the type/architecture of the base machine learning model such as a random forest classifier, a neural network, a support vector machine, etc. Training also varies depending on the type of base machine learning model and, for instance for neural networks, can occur across multiple training epochs and batches. For the naïve Bayes classifier, during training the trainer computes frequencies of each feature value in the count vectorizations and uses the frequencies in the multinomial naïve Bayes formula for determining likelihoods of triage decisions.
At block 604, the trainer determines whether there is a request received from an organization to train a model for flaw triage decision recommendation. The request can specify a quantitative degree to which the base machine learning model is used in training an organization-specific machine learning model. For instance, the request can specify to use all of the training data or to use training data specific to the organization (i.e., a subset of data that correspond to the organization). The request can further specify which triage decisions to use during training, for instance, when an organization does not use certain triage decisions. If a request was received, operational flow proceeds to block 606. Otherwise, operational flow returns to block 600.
At block 606, the trainer determines whether the request indicates using the base machine learning model. The request can further indicate a percentage of training data from the organization and other organizations to use during training and/or any filters to apply to the training data according to preferences of the organization. The preferences can indicate certain metrics of quality for training data to use such as sufficient documentation of the corresponding flaws. If the request indicates using the base machine learning model, operational flow proceeds to block 608. Otherwise, operational flow proceeds to block 610.
At block 608, the trainer indicates the base machine learning model as the trained machine learning model for flaw triage recommendation at the organization. The trainer can further indicate to the organization that the base machine learning model was trained on data outside the organization as a reminder of whether the organization desires a model that can incur bias from such training.
At block 610, the trainer collects organizational training data from flaws for the organization with labels comprising corresponding triage decisions. The organizational training data can be communicated by the organization with the request or can be requested by the trainer in response to receiving the request. The trainer can filter the organizational training data according to a time period when the flaws occurred, actions corresponding to triage decisions that were performed, etc. The trainer further generates count vectorizations of feature vectors for each flaw in the organizational training data.
At block 612, the trainer initializes and trains a new machine learning model with the organizational training data. For instance, when the machine learning model is a naïve Bayes classifier, the trainer can determine frequencies of triage decisions specified by the request for feature values in the organizational training data.
At block 614, the trainer deploys the trained machine learning model for flaw triage decision recommendation for the organization. The trained machine learning model can be deployed in the cloud and the organization can communicate detected flaws to the cloud for determining which triage decisions to perform. Alternatively, the trained machine learning model can be deployed on endpoint devices at the organization to bypass communications of detected flaws via the Internet. The trained machine learning model can be configured once deployed to ignore and/or audit flaws that correspond to certain flaw triage decisions undesired by the organization such as “accept the risk”.
Training of machine learning models for triage decision recommendation of flaws in the foregoing operation of
The foregoing disclosure refers variously to using naïve Bayes classifier to determine likelihoods of performing triage decisions for detected flaws and using Manhattan distance between feature vectors of flaws to determine flaw similarity. Any predictive machine learning model such as random forest classifiers, representative centroids from k-nearest neighbors clustering, neural networks, semi-supervised learning models, self-supervised learning models, etc. can be implemented. The output can correspond to multiple triage decisions and/or can corresponding to a binary variable indicating whether or not the triage decision was to ignore the flaw or fix the flaw. Different methods of preprocessing from count vectorization and different features can be implemented for varying models/model architectures.
The flowcharts are provided to aid in understanding the illustrations and are not to be used to limit scope of the claims. The flowcharts depict example operations that can vary within the scope of the claims. Additional operations may be performed; fewer operations may be performed; the operations may be performed in parallel; and the operations may be performed in a different order. For example, the operations depicted in blocks 504 and 506 can be performed in parallel or concurrently. With respect to
As will be appreciated, aspects of the disclosure may be embodied as a system, method or program code/instructions stored in one or more machine-readable media. Accordingly, aspects may take the form of hardware, software (including firmware, resident software, micro-code, etc.), or a combination of software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” The functionality presented as individual modules/units in the example illustrations can be organized differently in accordance with any one of platform (operating system and/or hardware), application ecosystem, interfaces, programmer preferences, programming language, administrator preferences, etc.
Any combination of one or more machine-readable medium(s) may be utilized. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable storage medium may be, for example, but not limited to, a system, apparatus, or device, that employs any one of or combination of electronic, magnetic, optical, electromagnetic, infrared, or semiconductor technology to store program code. More specific examples (a non-exhaustive list) of the machine-readable storage medium would include the following: a portable computer diskette, a hard disk, a random-access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a machine-readable storage medium may be any tangible medium that can contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. A machine-readable storage medium is not a machine-readable signal medium.
A machine-readable signal medium may include a propagated data signal with machine-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A machine-readable signal medium may be any machine-readable medium that is not a machine-readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a machine-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The program code/instructions may also be stored in a machine-readable medium that can direct a machine to function in a particular manner, such that the instructions stored in the machine-readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.