The present disclosure generally relates to automated impact detection. More specifically, the present disclosure generally relates to applying automated impact detection techniques to identify changes to a third-party library after an upgrade occurs. Additionally, the present disclosure generally relates to applying automated impact detection techniques to identify changes to a third-party library after a vulnerability patch occurs.
Maintaining up-to-date third-party libraries is tedious. Organizations often stick with old versions of dependencies to avoid rework, conflicts, and code breakage. For example, there may be limited visibility into whether package updates will break workflow. It requires time and effort to rerun deployment test cases to ensure new packages have not broken workflow. Also, test cases may not cover every scenario leading to unseen code breakage.
In a normal third-party library upgrade migration process, there may be four stages: Preparation, Compatibility, Field Test, and Trial Operation. Such stages must be performed manually by an analyst. In Preparation, the analyst runs an upgrade evaluation. In Compatibility, the analyst performs conflict resolution between different packages. In Field Test, the analyst performs function testing. In Trial Operation, the analyst performs pre-go-live testing on test branch. The analyst must perform a number of tasks manually, such as finding an updated library, importing the updated library, rebuilding and running an application for testing, rebuilding failures, unit testing failures, managing runtime errors, understanding the updated patch notes, manually fixing simple bugs, and issuing support tickets for complex errors to a development team.
Thus, it would be of great value to map dependencies in code bases to provide security and identify other issues. Nested dependencies in Software Bills of Material (SBOMs) create a real problem. Software Composition Analysis (SCA) is an automated process that identifies the open-source software in a codebase. SCA is typically performed to evaluate security, license compliance, and code quality. An SCA tool is usually found in a Static Application Security Toolkit (SAST) toolkit scanning package.
SCA may be performed as a part of a SAST platform update of new vulnerabilities at specified periods. Thus, organizations must wait to detect vulnerabilities in the next scheduled vulnerability assessment cycle. This can lead to delays and cost in a vulnerability detection lifecycle. SCA tools must wait for updating a new posted vulnerability until the next scan occurs. Likewise, scanning plans must wait for periodic scanning plans in cycles to occur. Also, SCA vendors often bill per scan, resulting in higher costs to reduce the time to identify vulnerabilities.
There is a need in the art for a system and method that addresses the shortcomings discussed above.
The proposed systems and methods describe an automated service for monitoring open-source packages in order to determine whether a software application may be impacted by changes to third-party code. If a vulnerability within an open-source package is detected, the system can also locate the affected functions within the current code, including functions that are nested or dependent on the primary affected functions. The system provides users with a tool that actively and continuously monitors vulnerability feeds in real-time for each function and dependency to detect whether a vulnerability is released that is linked to those functions. If a vulnerability is released, the tool can, in response, automatically fetch information about that vulnerability and perform an analysis of the package to determine whether the vulnerability impacts the user's application, and if so, which specific portions of the application would be affected. In some embodiments, the system can further provide guidance to the user regarding steps that might be taken to close or resolve the vulnerability.
The proposed embodiments are effective in reducing downtime during migrations as well as limiting the impact of vulnerabilities during upgrades provided by third-party libraries. These features (among others described) are specific improvements in way that the underlying computer system operates and how the software application is maintained. In addition, the proposed systems and methods solve technical challenges with software development targeted for update or migration. The improvements facilitate a more efficient, accurate, consistent, and less costly progression of software code in an application. The improved functioning of the underlying computer hardware itself achieves further technical benefits. For example, the system avoids tedious and resource-draining periodic checks of vulnerability feeds that may occur belatedly after an impact has led to disruptions in the software. The system thereby accelerates the timeline for successful upgrades to an application, and reduces operational downtime.
In one aspect, a method of performing automated, tailored vulnerability impact assessments is disclosed. The method includes a first step of retrieving a targeted third-party library repository including both a patched version and a current version, and a second step of cloning the patched version and the current version into a local storage. A third step includes generating a first code database for the cloned patched version and a second code database for the cloned current version, and a fourth step includes comparing the first code database with the second code database to identify differences. In addition, a fifth step includes generating an impact result based on the identified differences, and a sixth step includes transmitting an alert to a user including information about affected functions for the patched version of the targeted third-party library repository, based on the impact result.
In another aspect, a non-transitory computer-readable medium storing software comprising instructions is disclosed. The instructions are executable by one or more computers which, upon such execution, cause the one or more computers to perform automated, tailored vulnerability impact assessments by: (1) retrieving a targeted third-party library repository including both a patched version and a current version; (2) cloning the patched version and the current version into a local storage; (3) generating a first code database for the cloned patched version and a second code database for the cloned current version; (4) comparing the first code database with the second code database to identify differences; (5) generating an impact result based on the identified differences; and (6) transmitting an alert to a user including information about affected functions for the patched version of the targeted third-party library repository, based on the impact result.
In another aspect, a system for performing automated, tailored vulnerability impact assessments is disclosed. The system includes a processor and machine-readable media including instructions which, when executed by the processor, cause the processor to: (1) retrieve a targeted third-party library repository including both a patched version and a current version; (2) clone the patched version and the current version into a local storage; (3) generate a first code database for the cloned patched version and a second code database for the cloned current version; (4) compare the first code database with the second code database to identify differences; (5) generate an impact result based on the identified differences; and (6) transmit an alert to a user including information about affected functions for the patched version of the targeted third-party library repository, based on the impact result.
Other systems, methods, features, and advantages of the disclosure will be, or will become, apparent to one of ordinary skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description and this summary, be within the scope of the disclosure, and be protected by the following claims.
The invention can be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. Moreover, in the figures, like reference numerals designate corresponding parts throughout the different views.
A package impact detection system and method for detecting package impacts and generating tailored vulnerability impact assessments is disclosed. Software developers increasingly rely on open-source repositories and libraries to reduce workloads on development, as well as reduce risk in implementing complex capabilities like cryptography or memory management. While this improves code standardization and integration, it also expands the attack surface to software components no longer controlled by the organization.
The disclosed systems include provisions to automatically evaluate the impact of vulnerabilities identified by dependency vulnerability scanners in a software supply. For example, vulnerabilities in open-source dependencies can propagate into client source code and applications. Organizations need visibility and real-time monitoring to be alerted to and address these vulnerabilities when they appear. The proposed embodiments provide tools for automatic monitoring of open-source dependencies for vulnerabilities. In some embodiments, the system can prioritize the more vulnerable packages and functions within the code and additionally make recommendations for updates. An organization can then easily and in real-time monitor open-source vulnerability feeds and correlate those vulnerabilities to only the included packages in the source code, providing a unique view to each application. In different embodiments, if a difference between the current dependency and newly released fixed packages is detected—reflecting the changed functions within the dependency—the system can track the dataflow within the application source code to find all affected functions and output a report of the code usage for an analyst to address.
For example, packages may be updated to fix bugs and repair security risks. Alternatively, packages may be updated to improve performance of functions included in the packages. Organizations often maintain older versions of dependencies to avoid rework and conflicts. Conventionally, updating packages has required manually performing code analysis and running test cases. By offering an intelligent automated call path and dataflow analysis, embodiments streamline this process and identify the affected functions without need for manual oversight. By identifying the affected functions, it is possible for the system then to provide a metric of how much change occurs. Alternatively, by automatically identifying affected functions, a user who is performing an upgrade or who is fixing vulnerabilities can immediately assess which portions of a third-party library repository are affected by an upgrade process or by a vulnerability identification process. Thus, the end-user can more readily focus efforts on these affected portions, simplifying the remediation process and reducing the time and costs needed to secure or patch the vulnerabilities.
In addition, in some embodiments, the disclosed systems can automatically evaluate compatibility when migrating a third-party open-source library to an upgrade version. For example, to make the open-source dependencies in client application up-to-date organizations usually test and verify the compatibility of upgrade version to prevent the runtime issue. The proposed embodiments instead allow organizations to perform automated evaluations of the upgrade version of the open-source libraries/dependencies to detect the affected function usage in the context of the update. The affected usage and functions within their code can be automatically identified and, in some embodiments, the system can further generate intelligent recommendations for updates. As noted above, the tool can analyze the difference between the current dependency and newly released updated packages in order to identify the changed functions within the dependency. This information is then used to track the data flow within the application source code to find affected functions and usage for the developer to facilitate updating of the application. Rather than relying on an iterative manual approach that is performed only periodically and following a schedule, and can leave the application vulnerable in-between each cycle, the system can continuously monitor and track the dataflow to generate an automated impact assessment workflow for software dependency management and upgrade recommendations as well as associated costs.
For purposes of introduction,
Next, the process can continue with operation 112 of identifying the Software Bill of Materials (SBOM) which describes the software dependencies within a given application and/or the software supply chain. In such a manner, application dependencies can be enumerated in a SBOM. By generating the SBOM in operation 112, the SBOM identifies dependencies for a package. For example, this operation may be performed using the Syft command line tool. Then, the process can continue with operation 114 of checking vulnerabilities. This process of checking vulnerabilities in operation 114 involves a regular fetch of newly released vulnerabilities related to an SBOM as published by multiple feeds. The checking of vulnerabilities may occur by using an open-source vulnerability (OSV) Schema and/or an OSV API, or other standard API formats. In addition, in some embodiments, SBOM dependency versions can be correlated against existing and new Common Vulnerabilities and Exposures (CVEs).
As a general matter, OSV provides CVEs in a human and machine-readable data format developed in collaboration with open-source communities. OSV serves as an aggregator of vulnerability databases including, but not limited to the GitHub Advisory Database, PyPI Advisory Database, Go Vulnerability Database, Rust Advisory Database, Global Security Database, and OSS-Fuzz. There are APIs available to query for all known vulnerabilities. Typically, OSVs provide real-time updates of an aggregated CVE feed, that are updated whenever a vulnerability is published or modified. These feeds can include a summary, with a description of related CVE(s), any related Common Weakness Enumerations (CWE(s)), identification of the affected package and version range, and the affected functions.
As shown in
In different embodiments, the information in local storage 216 is also processed to create a Software Bill of Materials (SBOM) list 226. The packages in the SBOM list 226 can be checked at operation 228. The results of the checking in operation 228 can be provided to an agent to check/evaluate and to generate an alert in case of a detected vulnerability in operation 238. For example, the agent may trigger an alert for the vulnerable package usage 258 in operation 230. The agent may also employ an Application Programming Interface (API) 234 to interact with an open-source vulnerability (OSV) database 232. In some embodiments, the OSV database 232 can provide an OSV vulnerability schema 236 to the agent.
In addition, in some embodiments, the OSV vulnerability schema 236 can provide fixed version information 240 for the automated impact analysis 224. The automated impact analysis 224 can also receive information about application usage of a vulnerable package 242 from the targeted application repository 210. Thus, in different embodiments, the process can be understood to comprise an initial phase 252 and a maintenance phase 250. In the initial phase 252 an original vulnerability assessment occurs. In the maintenance phase 250, the original vulnerability assessment is kept valid and up-to-date. Based on the interactions, the agent can also receive information from the automated impact analysis 224. In some embodiments, the received information also allows the agent to provide an alert of the vulnerable package usage 258.
For purposes of clarity,
By contrast, the present embodiments provide a substantially improved approach to impact analysis. As shown in second diagram 330, the method begins with an operation 332, in which call path analysis occurs. Specifically, the method identifies the updated/patched nodes in the library during the operation 332. In operation 334, dataflow analysis occurs. Specifically, the method identifies affected functions/object in the target application based on dataflow during operation 334. In other words, operation 334 can include an automated identification of which nodes in the target application are affected, based on data flowing to the updated nodes.
Furthermore, at an operation 336, the method calculates impact. Specifically, the method calculates impact based on a graph of affected nodes in the target application during operation 336. In one example, the impact is based on the ratio of affected nodes in the target application to unaffected nodes. In an operation 338, the method causes edits of the affected functions. More specifically, the method includes modifying and fixing that affected functions that were identified previously.
Additional details with respect to how operations 332 and 334 are performed are provided below--for example,
In some embodiments, each of updated version 420 and current version 422 is cloned into a local storage 428 of a local service 430. For example, updated version 420 is cloned at operation 424, and current version 422 is cloned at operation 426. In one embodiment, the local storage 428 performs an operation 440 including: (a) generating a code database for updated version 442 and (b) generating a code database for the current version 444. The updated version 442 can be automatically associated with or assigned to a corresponding call path/dataflow assessment. The current version 444 can also be associated with or assigned to a corresponding call path/dataflow assessment.
In different embodiments, at operation 450, the local service 430 can compare/detect the differences between the updated version 442 and the current version 444. As part of this step, the differences between the updated version 442 and the current version 444 are also categorized or classified as either “no impact” differences and “impact” (or “impactful”) differences. For example, “no impact” or “non-impactful” differences could include comment modifications, space or change-line modifications, or variable name modifications. Such modifications would be deemed non-substantive. On the other hand, “impact” differences could include changes where a function/class is removed, method invocation changes, method parameters change, or control logic change, or other substantive modifications, as non-limiting examples.
In general, certain scripts or other programs can identify/filter changes that are “no impact” and isolate them from the other changes for purposes of classifying the two groups. For example, referring briefly to
Returning now to
In different embodiments, these queries 650 can for example include custom queries 660 targeting a particular purpose. In some embodiments, the queries 650 may correspond to function calls or may correspond to vulnerabilities. In performing the queries 650, the code querying engine 630 may be able to assess differences based on the results of the queries 650. For purposes of clarity to the reader, a non-limiting example of code that can be used for the queries 650 is shown as code 670.
For example, in an embodiment in which a goal of the system is to determine the effects of an update on a function, the queries 650 may be queries to find callable nodes. The system may also include a build call path analysis that defines callers and callees and queries all invocations between them. In some embodiments, there may also be a dataflow analysis, which defines the data to trace, and then queries all nodes that touch the data. In another embodiment where a goal of the system is to track vulnerabilities, the queries 650 are performed to allow developers to track the differential code and functions in the application dataflow. Custom queries 660 for the code analysis tool can find affected functions and objects in the application dataflow. The code analysis tool may also provide a code analysis engine 630 that can automate security checks.
In either of these cases, once the code analysis tool has employed the code analysis engine 630 to execute the queries 650, the code analysis engine 630 can produce an alert result 640. In different embodiments, an alert result 640 can offer useful and/or critical information to a user. In particular, in one case, the alert result 640 may establish which functions in a package would be impacted by an upgrade. Accordingly, the alert result 640 may indicate a degree of change/difference as well as guide a user about how to more effectively target their efforts. In another case, the alert result 640 may establish which functions in a package are vulnerable, guiding a user toward specific remediation techniques and tools for fixing the identified vulnerabilities. As noted above, there may also be a graphical depiction of results.
In order to allow for a better appreciation of the proposed embodiments,
The disclosed system 800 may include a plurality of components capable of performing the disclosed computer implemented method. For example, system 800 may include a user device 804, a computing system 818, and a database 814. Database 814 may store information about one or more code repositories associated with third-party libraries. For example, database 814 may store databases containing one or more code repositories as well as properties and metadata for the code repositories. In another example, database 814 may store files including information about one or more files including source code for the code repositories.
The components of system 800 can communicate with each other through a communication network 816. For example, user device 804 may retrieve information about a third-party library repository from database 814 via communication network 816. In some embodiments, communication network 816 may be a wide area network (“WAN”), e.g., the Internet. In other embodiments, communication network 816 may be a local area network (“LAN”).
While
For example, the user device 804 may instruct the computing system 818 to perform various operations to assess the impact of an upgrade or a vulnerability repair on a third-party library. Once the computing system 818 has assessed the impact, the computing system 818 may provide results to the user device 804. These results may be a metric expressing the extent to which the changes affect the third-party library. Alternatively, the results may include information about which specific functions in the third-party are affected by the changes. The information about specific functions will help the user of the user device 804 target efforts when implementing the changes.
As shown in
Computing system 818 includes a processor 820 and a memory 822. Processor 820 may include a single device processor located on a single device, or the processor 820 may include multiple device processors located on one or more physical devices. Memory 822 may include any type of storage, which may be physically located on one physical device, or on multiple physical devices. In some cases, computing system 818 may comprise one or more servers that are used to host the system. The processor 820 and the memory 822 may implement the tester 806 and the impact detector 808 to gather appropriate results to send to the user device 804.
In different embodiments, embodiments may include a third-party library that is to be upgraded or a third-party library in which potential vulnerabilities are to be identified and fixed or patched. There may be a goal to efficiently establish where upgrading or patching an affected library will have an impact. For example, there may be a storage including a library, and the information in the library may be submitted to a tester and to an impact detector. These modules are able to identify which portions of code have changes that would have an impact, and which portions of code have changes that would not be considered to have an impact.
Once the code having an impact is identified, the impact detector can further determine which particular functions would be affected by the transformation. This determination can yield information about how much impact would occur. The determination also clarifies where the impact will occur, which can serve as a guide when ensuring that an upgrade or a vulnerability patch can proceed successfully. Accordingly, the disclosed approaches automate tasks that would otherwise need to be performed manually. By such automation, time, money, and effort can be saved as the user can be forewarned in advance as to whether a third-party update will likely/potentially lead to a vulnerability in their software application, and where they would best focus their efforts proactively. Accordingly, the disclosed approaches provide a technical solution that improves the ability of a system to solve problems that arise when determining what to do when making changes to a third-party library repository.
Thus, the proposed embodiments offer a more rapid vulnerability assessment compared to conventional techniques. For example, the disclosed systems incorporate a new integrated workflow that can automatically (a) monitor OSV database(s) to retrieve newly posted vulnerabilities faster, (b) combine Syft and OSV API to discover the vulnerabilities with CVE numbers in the dependency and applicable functions, (c) combine Syft and CodeQL to detect the vulnerable package import and code function location, and (d) use differential code snippets and CodeQL to analyze the dataflow and call path difference and evaluate the impact of vulnerability fix on application workflow. This approach significantly improves vulnerability detection, where the average time to identify newly posted SBOM vulnerabilities and generate alerts can be less than an hour, while also providing users with a custom vulnerability impact assessment for their applications. For example, in one test, the proposed system fetched the vulnerability within 3 hours of the update, while in contrast a conventionally available tool took approximately 2 weeks to update
In other embodiments, the method may include additional steps or aspects. In some embodiments, the patched version includes modifications made to the current version. In another example, the method also includes separating the identified differences into a first group that includes those identified differences involving one or more of a comment modification, change-line modification, and variable name modification, and a second group including any identified differences that are unassigned to the first group. In some embodiments, the method also includes classifying each of the differences in the first group as non-substantive or no impact. In one embodiment, the method also includes classifying each of the differences in the second group as substantive or impact. In some embodiments, differences classified as substantive include differences based on a removal of a function or class, changes in parameters, or control logic changes. In one example, the impact result is further based on the classification of each difference as either substantive or non-substantive. In another example, the method includes providing a user interface that allows an end-user to manage upgrades of the affected functions based on the impact result.
Embodiments may include a non-transitory computer-readable medium (CRM) storing software comprising instructions executable by one or more computers which, upon such execution, cause the one or more computers to perform the disclosed methods. Non-transitory CRM may refer to a CRM that stores data for short periods or in the presence of power such as a memory device or Random Access Memory (RAM). For example, a non-transitory computer-readable medium may include storage components, such as, a hard disk (e.g., a magnetic disk, an optical disk, a magneto-optic disk, and/or a solid-state disk), a compact disc (CD), a digital versatile disc (DVD), a floppy disk, a cartridge, and/or a magnetic tape.
Embodiments may also include one or more computers and one or more storage devices storing instructions that are operable, when executed by the one or more computers, to cause the one or more computers to perform the disclosed methods.
Certain embodiments may use cloud computing environments. Cloud computing environments can include, for example, an environment that hosts the services for impact analysis and detection described herein. The cloud computing environment may provide computation, software, data access, storage, etc. services that do not require end-user knowledge of a physical location and configuration of system(s) and/or device(s) that hosts the impact analysis and detection services. For example, a cloud computing environment may include a group of computing resources (referred to collectively as “computing resources” and individually as “computing resource”).
While this specification contains many specifics, these should not be construed as limitations on the scope of the disclosure or of what may be claimed, but rather as descriptions of features specific to particular implementations. Certain features that are described in this specification in the context of separate implementations may also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation may also be implemented in multiple implementations separately or in any suitable sub-combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination may in some examples be excised from the combination, and the claimed combination may be directed to a sub-combination or variation of a sub-combination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems may generally be integrated together in a single software product or packaged into multiple software products.
While various embodiments of the invention have been described, the description is intended to be exemplary, rather than limiting, and it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible that are within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents. Also, various modifications and changes may be made within the scope of the attached claims.
This application is a continuation-in-part of and claims the benefit of U.S. Provisional Patent Application Ser. No. 63/493, 116 filed on Mar. 30, 2023 and titled “Automated Impact Detection”, the disclosure of which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
63493116 | Mar 2023 | US |