This invention relates to identifying source-code and configuration materials that affect the function of specific components in a cloud-native application's infrastructure, and, conversely, identifying all infrastructure components that run code built from, are created based on, or otherwise function in a way that is affected by, specific source-code or configuration files.
Cloud-native applications are typically composed of a heterogeneous collection of components such as code artifacts, services, APIs, and infrastructure components. Analyzing the attack surface of cloud-native applications to find vulnerabilities and security risks involves looking at this entire collection of components. Moreover, identifying some vulnerabilities or risks requires that multiple components be considered along with the relationships between them.
According to the present invention there is provided a method for mapping source code to computation resource, the method including the steps of: determining computation resources of a cloud provider used by an application; identifying executable artifacts that are deployed on the computation resources; and matching executable artifacts to source-code and configuration content to provide artifact to code or configuration matches.
According to further features the step of identifying executable artifacts includes identifying contained artifacts that are embedded in the executable artifacts.
According to further features the method further includes obtaining content of all or parts of the physical or virtual storage devices used by the computation resources and metadata about the computation resources.
According to further features the method further includes monitoring build processes of the source-code that generate the executable artifacts.
According to further features the step of matching executable artifacts includes generating candidate matches and assigning a confidence score to each of the candidate matches, the confidence score indicates a likelihood of being an actual match.
According to further features the method further includes the step of: employing an optimization algorithm that selects a matching that maximizes the overall or total confidence score, while not including contradicting matches.
According to further features the candidate matches are generated using a Name-based Matching of Artifact To Code mechanism wherein names of artifacts are compared against exact or approximate names of modules and repositories, and exact or approximate names of generated artifacts as they are expressed in build and project files within the modules and repositories.
According to further features the candidate matches are generated using an Artifact Metadata-based Matching To Code mechanism wherein artifact metadata is obtained from at least one source of the group of sources including: an executable header, an executable version information resource; an executable-embedded manifest; a manifest file alongside an executable artifact; and artifacts managed in an artifact repository; and wherein the artifact metadata is used to match to repositories based on predefined rules.
According to further features the candidate matches are generated using a Dependency Fingerprint-based Matching To Code mechanism, wherein a respective fingerprint is created including dependencies of each artifact of the executable artifacts and comparing the fingerprint to declared dependencies declared for modules in the source-code.
According to further features the candidate matches are generated using a Symbol-based Matching of Artifact To Code mechanism wherein at least one of: class names, exported functions, and internal symbols present in executable artifacts, are compared to a list of symbols devised from the source-code.
According to further features the step of matching executable artifacts includes generating candidate matches using a Build Process Tracking for Matching Artifact to Code mechanism wherein the build process in a continuous integration and continuous delivery/continuous deployment (Cl/CD) is monitored to identify potential names of the executable artifacts.
According to further features the method further includes the steps of: recording the artifact to code or configuration matches in a database; and allowing intervention by a manual operator to update or override the artifact to code or configuration matches recorded in the database.
Various embodiments are herein described, by way of example only, with reference to the accompanying drawing, wherein:
The principles and operation of a system and method for mapping source code components and risks to runtime according to the present invention may be better understood with reference to the drawing and the accompanying description.
The invention provides multiple approaches to associate infrastructure components with affecting source-code and configuration materials. Referring now to the drawings,
The different approaches included in this invention differ from each other primarily in how they execute Step 3 above; namely, how exactly the matching occurs between executable artifacts and the code that produced them.
By identifying the source-code and configuration content that is used to build executable code that is running on infrastructure resources, information about those resources—such as the level of their exposure to external communication, and relations among them—can be leveraged to add context when inspecting the application for vulnerabilities and security risks.
The terms ‘computation resources’ and ‘infrastructure computation resources’ are used interchangeably herein. Infrastructure computation resources are therefore often simply referred to as computation resources, and within that context, may be further simplified to just ‘resources’.
There is presented a comprehensive mechanism that discovers and normalizes representation of resources within cloud-native applications. By combining querying, static analysis, and runtime analysis, the system can effectively identify resources and handle them, regardless of their source.
Discovery and Inspection
Discovery of computational resources in a cloud infrastructure environment can be achieved by one or more of the following methods:
Hybrid solutions will utilize multiple approaches from the list above to cover an entire infrastructure portfolio. For instance, a management API of a cloud provider can be used to identify Kubernetes clusters deployed on the cloud which are, in turn, introspected using the Kubernetes management API to identify compute nodes within the cluster. In parallel, serverless nodes are identified by the same management API.
Identifying Executable Artifacts
Once computation nodes are identified and information about them is fetched, executable artifacts that are run by them are identified. Executable artifacts are best modeled as a containment hierarchy—often artifacts may have other artifacts embedded in them. Therefore, in some embodiments, identifying executable artifacts includes identifying contained artifacts that are embedded in the executable artifacts. Identifying these contained artifacts and the containment hierarchy can boost the performance of the matching process. For example, for Kubernetes clusters, Pod computing resources may run Containers that, in turn, load a Container Image and execute that. The Container Image can be fetched from a Container Registry and inspected for content to identify finer-grained artifacts such as executable files launched by the image. Other Container-based technologies such as managed container services can be processed in a similar method.
In embodiments, the system monitors the build processes of the source-code that generate the executable artifacts. This monitoring can be useful in matching names of executable artifacts to source-code that was used in the build processes.
In addition to listing the executable artifacts deployed on the various infrastructure computation resources, obtaining their content, and metadata about them, is also valuable for the matching process.
Matching Executable Artifacts
Once a population of executable artifacts is identified, a target set of repositories and fragments thereof (called Modules) is identified. Then, a process for matching between artifacts and the target set is carried out. At the heart of this process are several techniques that allow identifying matches (or mismatches) between artifacts and code. These techniques can be employed in various configurations, including:
Name-Based Matching of Artifact to Code
In this technique, the names of artifacts are compared against the following:
Comparison should Take into Account Some Name-Derivation Patterns Such as:
Artifact Metadata-based Matching To Code
In this technique, artifact metadata is obtained from the following sources:
The metadata may be used to match to repositories based on the following logic:
Dependency Fingerprint-based Matching To Code
Artifacts often depend on other artifacts—for example dynamic link libraries, shared objects or JAR files. By creating a fingerprint that consists of the dependencies of an artifact and comparing it to the declared dependencies declared for some modules in code, matching can be performed. The fingerprint may or may not include dependency version information, and the matching should allow for some discrepancies. “Ambient” dependencies such as OS provided API libraries can be eliminated from the computation.
Symbol-based Matching of Artifact To Code
Class names, exported functions and internal symbols present in an artifact may be compared to a list of symbols as devised from source code. This comparison does not have to encompass all symbols in the source and target; it is enough that it focuses on a set of distinct anchor symbols. The set of anchor symbol names can be generated by identifying symbol names unique to a module or repository. Symbol names are identified through parsing source-code files using a parser adapted to identify declarations in target programming languages that are compiled into symbols. In some languages techniques that transform the symbol as it appears in source-code declaration to the predicted symbol name in a binary should be applied (for example name mangling in C++).
Build Process Tracking for Matching Artifact to Code
By monitoring build process in continuous integration and continuous delivery/continuous deployment (Cl/CD) systems, either in real-time or by inspecting build result logs, it is sometimes possible to identify artifact names that are built and from which repository or module they were built. Real-time inspection may be implemented based on instrumentation of the build system to track file operations, or using integration APIs specifically tailored for that purpose. Log inspection can be done by either fetching logs using an agent on the build machine or accessing logs through an API. Finally, some Cl/CD systems allow invoking custom build steps; such a build step may be used to report the build source and target artifact.
Example of Matching Process
To illustrate the process described above, we now review how it would be executed on a system S that is deployed on two containers in a cloud provider. The containers run two image files, each implementing a microservice. The first image file is named FrontendService and the second one is named AnalyticsService. We further assume that each image is built from its own code repository—Certibig and Analytics, respectively. Also assume that there is an additional code repository, called Service.
With this theoretical system, the process starts by identifying the running containers through cloud provider API calls. This stage would identify the two containers (or more) running images FrontendService and AnalyticsService. Various matching options are now generated by the algorithm, based on the methods described above and assigned a confidence score:
While the invention has been described with respect to a limited number of embodiments, it will be appreciated that many variations, modifications and other applications of the invention may be made. Therefore, the claimed invention as recited in the claims that follow is not limited to the embodiments described herein.
This patent application claims the benefit of, and priority from, U.S. Provisional Patent Application No. 63/391,063, filed Jul. 21, 2022, which is incorporated in its entirety as if fully set forth herein.
Number | Date | Country | |
---|---|---|---|
63391063 | Jul 2022 | US |