The disclosure generally relates to the field of information security, and more particularly to software development, installation, and management.
As with many endeavors, businesses that develop and sell software and/or provide a software-based service must make decisions that balance the pragmatism of running a business with the goal of high-quality for perfect customer experience. Producing high-quality program code that is free of any flaws and immaculately written (i.e., easy to read and/or conforming to best practices) is desirable, but the time it would take to ensure billions of lines of code is immaculate and free of any flaws would require an impractical investment in code review time by senior software engineers/developers. This would make the software/service unaffordable. This need for pragmatism in a hypercompetitive space of software development results in “technical debt.” Technical debt is a term that analogizes software development to financial debt. To meet a release deadline, developers may be forced to release program code with flaws, known or unknown. The program code is released under the assumption that the flaws will be found and corrected in later releases or updates. These flaws can be considered debt and the future work to correct these flaws, as well as code refactoring, is considered the accumulating interest. As time passes and flaws are not addressed (i.e., the debt is not reduced), the amount of time to correct is presumed to increase (i.e., the interest on the technical debt increases).
Continuous code review is a team-based commitment that attempts to address the balance between goals and pragmatism in code development. With continuous code review, a team of developers commits to speedily reviewing code commits. Some models for implementing this code review commitment include trunk-based development and pull requests. Although the details vary, these models generally involve a developer committing his/her program code to be merged into another collaborative instance of program code (e.g., trunk or branch). Before his/her program code is merged, another team member reviews and approves the program code for merger or returns the program code for modification. A software development tool for source code and version control facilitates the commitment and approval process.
Embodiments of the disclosure may be better understood by referencing the accompanying drawings.
The description that follows includes example systems, methods, techniques, and program flows that embody aspects of the disclosure. However, it is understood that this disclosure may be practiced without these specific details. In other instances, well-known instruction instances, protocols, structures and techniques have not been shown in detail in order not to obfuscate the description.
Overview
A software development tool has been designed that scans “diffs” of a submitted code fragment and identifies security flaws introduced by the submitted code fragment. When a code fragment is submitted for merger with a target program code, the software development tool (“tool”) determines the differences (e.g., additions, edits, deletions) between the code fragment and the target program code. The target program code may be a primary program code (e.g., main branch or trunk) or another branch or fork. A code fragment may be a subroutine, one or more files of program code, or a line of program code. The tool scans the diffs for security flaws and can also operate as a linter against the diffs (e.g., scan the diffs for stylistic errors). The tool identifies diffs that introduce security flaws or fail to comply with linter policy/rules in a user interface of the tool and can be programmed to disregard specified flaws to expedite review. Focusing the scanning on diffs avoids overwhelming peer reviewers with the technical debt and allows reviewers to fulfill the commitment to expedited review and the continuous development process.
At a stage A, the software development tool 101 identifies diffs between a code fragment 105 and a target code unit 107. The software development tool 101 can include diff identifying functionality or invoke a separate utility to identify the diffs. The identification of diffs can be triggered in response to a request to merge the code fragment 105 into or with the target code unit 107. The software development tool 101 generates a file 108 that indicates the code units of the code fragment 105 that are different than the target code 107. The code units with diffs in the file 108 are at a granularity than can be scanned by the code flaw scanner 103. This granularity is a line of code in this illustration but can be configured in the software development tool 101 differently (e.g., n lines of code or a subroutine). The software development tool 101 passes the file 108 to the code flaw scanner 103 for scanning.
At stage B, the code flaw scanner scans the code units in the file 108 to determine whether the diffs introduce a security flaw. The code flaw scanner 103 scans the code units of the file 108 based on a set of one or more diff-based security flaws policies in a repository 109. The diff-based security flaws policies indicate types of changes to program code that have been identified as introducing vulnerabilities into program code. For instance, a security flaw policy may indicate that adding a function call defined by a particular API introduces a vulnerability or that insertion of a text field into a form without corresponding program code to verify input into the text field is not malicious code injection. In addition to scanning the changes to determine whether those changes introduce vulnerabilities, the code flaw scanner 103 can also scan the code fragment 105 to ascertain whether the code fragment 105 includes vulnerabilities. This can be considered an abbreviated scan in addition to the scan of changes or “diff scan” because the code flaw scanner 103 would be scanning code that already exists in the target code unit 107 but is not all of the target code unit 107.
Based on the diff scanning, the code flaw scanner 103 returns an indication of security vulnerabilities introduced by the diffs in association with the diffs. For example, the code flaw scanner 103 can return a data structure of file that identifies the code units by line number along with the corresponding security vulnerability introduced by each code unit. With this information, the security development tool 101 generates a map 104. The map 104 is a mapping of the code units identified as diffs (i.e., code units with changes) to annotations that indicate the security vulnerabilities detected by the code flaw scanner 103.
At stage C, the software development tool 101 communicates the information from the diff scanning and the indication of diffs to the GUI engine 121. The software development tool 101 passes a structure 110 that includes the code fragment 105 with indications of the diffs. The indications of the diffs can be values indicating a type of change (e.g., insertion or deletion) associated with line numbers or in a field associated with the field containing the corresponding code unit of the code fragment 105. The software development tool 101 also passes the map 104 to the GUI engine 121. Although this illustration describes a separation of the information, embodiments can generate and maintain the information of diffs and corresponding diff scanning results and as a single structure or as structures that reference each other. The GUI engine 121 uses the communicated information to render a user interface that allows a reviewer to determine the vulnerabilities, if any, introduced by the changes.
After a software development tool determines diffs between a submitted code fragment and a target code unit, a scanner detects the code fragment with the determined diffs (301). The scanner can be invoked with a reference to the code fragment and a reference to the diffs, assuming the code fragment and diffs are indicated in different structures or files. The scanner can be invoked or receive a request that indicates a reference to a single file or structure that includes the code fragment and determined diffs. For instance, the scanner can be invoked with an argument that is a pointer to a data structure that associates indexes into a structured code fragment (e.g., the code fragment with line numbers) with codes or values that indicate diff type (e.g., insertion, deletion, edit).
With the determined diffs, the scanner iterates over the diffs (303) and scanning policies (305) to determine whether any of the determined diffs will introduce a flaw into the target code unit. In these example operations, the scanner evaluates each diff against the one or more policies being enforced by the software development tool. The description of
The scanner will determine whether the scanning detected one or more flaws (311). If the diff scanning did not detect a flaw, then the scanner proceeds to evaluate the diff against the next policy, if any (313). If the diff scanning detected a flaw, then the scanner annotates the diff based on the detected flaw (311). The annotation identifies the flaw that would be introduced by the diff and describes the vulnerability. For instance, the scanner may add annotation data that identifies the flaw by type (e.g., code lint) and a description of the flaw (e.g., variable name does not conform to defined naming convention in flaw policy).
If there are no other policies to evaluates against the current diff, then the scanner proceeds to scan the next diff (312). If the determined diffs have been scanned, then the scanner stores the annotated code fragment for code review (315). The annotations can be a separate structured with entries referenced by entries of the code fragment.
The flowcharts are provided to aid in understanding the illustrations and are not to be used to limit scope of the claims. The flowcharts depict example operations that can vary within the scope of the claims. Additional operations may be performed; fewer operations may be performed; the operations may be performed in parallel; and the operations may be performed in a different order. For example, the iterating operations depicted in
As will be appreciated, aspects of the disclosure may be embodied as a system, method or program code/instructions stored in one or more machine-readable media. Accordingly, aspects may take the form of hardware, software (including firmware, resident software, micro-code, etc.), or a combination of software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” The functionality presented as individual modules/units in the example illustrations can be organized differently in accordance with any one of platform (operating system and/or hardware), application ecosystem, interfaces, programmer preferences, programming language, administrator preferences, etc.
Any combination of one or more machine-readable medium(s) may be utilized. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable storage medium may be, for example, but not limited to, a system, apparatus, or device, that employs any one of or combination of electronic, magnetic, optical, electromagnetic, infrared, or semiconductor technology to store program code. More specific examples (a non-exhaustive list) of the machine-readable storage medium would include the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a machine-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. A machine-readable storage medium is not a machine-readable signal medium.
A machine-readable signal medium may include a propagated data signal with machine-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A machine-readable signal medium may be any machine-readable medium that is not a machine-readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a machine-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as the Java® programming language, C++ or the like; a dynamic programming language such as Python; a scripting language such as Perl programming language or PowerShell script language; and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on a stand-alone machine, may execute in a distributed manner across multiple machines, and may execute on one machine while providing results and or accepting input on another machine.
The program code/instructions may also be stored in a machine-readable medium that can direct a machine to function in a particular manner, such that the instructions stored in the machine-readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
While the aspects of the disclosure are described with reference to various implementations and exploitations, it will be understood that these aspects are illustrative and that the scope of the claims is not limited to them. In general, techniques for scanning code diffs to determine whether any introduce code flaws as described herein may be implemented with facilities consistent with any hardware system or hardware systems. Many variations, modifications, additions, and improvements are possible.
Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the disclosure. In general, structures and functionality presented as separate components in the example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the disclosure.
Use of the phrase “at least one of” preceding a list with the conjunction “and” should not be treated as an exclusive list and should not be construed as a list of categories with one item from each category, unless specifically stated otherwise. A clause that recites “at least one of A, B, and C” can be infringed with only one of the listed items, multiple of the listed items, and one or more of the items in the list and another item not listed.