Infrastructure as code (IaC) uses machine-readable definition files, rather than physical hardware configurations or interactive configuration tools, for managing and provisioning computer data centers. Continuous configuration automation can leverage IaC to automate the deployment and configuration of settings of data center infrastructure.
Some implementations described herein relate to a system for correction of non-compliant files in a code repository. The system may include one or more memories and one or more processors communicatively coupled to the one or more memories. The one or more processors may be configured to obtain first information relating to a code repository that includes one or more files that indicate a configuration for infrastructure that is to be provisioned in a cloud computing environment. The one or more processors may be configured to obtain second information relating to one or more compliance rules for provisioning the infrastructure in the cloud computing environment. The one or more processors may be configured to perform a scan of a content of the one or more files for violations of the one or more compliance rules. The scan may use natural language processing to identify at least one of a cloud computing provider for the cloud computing environment or a programming language used in the one or more files. The one or more processors may be configured to identify, in connection with the scan of the content of the one or more files and based on the at least one of the cloud computing provider or the programming language, that the content of the one or more files includes at least one violation of the one or more compliance rules. The one or more processors may be configured to modify the content of the one or more files to correct the at least one violation in accordance with the one or more compliance rules. The one or more processors may be configured to transmit a request to merge the one or more files into the code repository. The one or more processors may be configured to determine, using a machine learning model and based on the at least one violation, a severity of the at least one violation. The one or more processors may be configured to transmit, to a user device associated with a user of the code repository, a notification indicating the severity of the at least one violation.
Some implementations described herein relate to a method of correction of non-compliant files in a code repository. The method may include performing, by a device, a scan of a content of one or more files in a code repository for violations of one or more compliance rules, where the one or more files indicate a configuration for infrastructure that is to be provisioned in a cloud computing environment. The method may include identifying, by the device in connection with the scan of the content of the one or more files, that the content of the one or more files includes at least one violation of the one or more compliance rules. The method may include modifying, by the device, the content of the one or more files to correct the at least one violation in accordance with the one or more compliance rules. The method may include determining, by the device using a machine learning model, a probability as to whether a build of code of the code repository, using the one or more files with the content that is modified, is likely to pass. The method may include transmitting, by the device based on the probability that the build of code of the code repository is likely to pass, a request to merge the one or more files into the code repository.
Some implementations described herein relate to a non-transitory computer-readable medium that stores a set of instructions for correction of non-compliant files in a code repository for a device. The set of instructions, when executed by one or more processors of the device, may cause the device to perform a scan of a content of one or more files in a code repository for violations of one or more compliance rules. The one or more files may indicate a configuration for infrastructure that is to be provisioned in a cloud computing environment. The scan may use natural language processing to identify at least one of a cloud computing provider for the cloud computing environment or a programming language used in the one or more files. The set of instructions, when executed by one or more processors of the device, may cause the device to identify, in connection with the scan of the content of the one or more files and based on the at least one of the cloud computing provider or the programming language, that the content of the one or more files includes at least one violation of the one or more compliance rules. The set of instructions, when executed by one or more processors of the device, may cause the device to modify the content of the one or more files to correct the at least one violation in accordance with the one or more compliance rules. The set of instructions, when executed by one or more processors of the device, may cause the device to transmit a request to merge the one or more files into the code repository.
The following detailed description of example implementations refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements.
A cloud computing provider may provide a set of cloud computing services to an entity (e.g., a company, an organization, or an institution) via a cloud computing environment. The cloud computing environment may provide the functionality of one or more physical computers, such as by using emulation of hardware and/or software that may be implemented in a physical computer. The entity that uses the cloud computing services may provision one or more virtual machines (e.g., a virtual representation of a physical computer), serverless computing functions, load balancers, volumes, databases, or the like, in the cloud computing environment. These provisioned components of the cloud computing environment may be referred to as “infrastructure.”
A configuration for the infrastructure may be indicated in one or more definition files. Accordingly, to provision the infrastructure, the configuration may be read from the one or more definition files and communicated to the cloud computing provider (e.g., using an application programming interface (API)) for deployment in the cloud computing environment. An infrastructure configuration that is defined by code in one or more files may be referred to as “infrastructure as code” (IaC). Often, the entity that uses the cloud computing services may have a set of compliance rules for provisioning the infrastructure in the cloud computing environment. The compliance rules may be intended to provide data security, interoperability, compatibility, and/or may relate to other best practices.
In some cases, the infrastructure provisioned in the cloud computing environment may be analyzed to identify violations of the compliance rules, and the infrastructure may be reconfigured to correct the violations and reprovisioned in the cloud computing environment. However, this analysis, reconfiguration, and reprovisioning unnecessarily uses or allocates cloud computing resources (e.g., processing resources, memory resources, communication resources, and/or power resources, among other examples). Moreover, such a reactive approach may allow violations of the compliance rules to persist in a production environment for a period of time before the violations are corrected, thereby impairing a security of the infrastructure and/or a stability of the infrastructure.
Some implementations described herein provide for proactive detection and correction of non-compliant infrastructure. For example, detection and correction of the non-compliant infrastructure may be performed on files in a code repository that define a configuration for the infrastructure (e.g., IaC files) before the infrastructure is provisioned in a cloud computing environment. In some implementations, a system may perform a scan of a content of files in a code repository for violations of compliance rules. The system may identify violations of the compliance rules and automatically modify the content of the files to correct the violations in accordance with the compliance rules.
In this way, violations of the compliance rules may be corrected before the infrastructure is provisioned in the cloud computing environment. Accordingly, this may allow cloud computing resources, that could otherwise be used to provide, detect, and correct non-compliant infrastructure, to be conserved. This may improve a performance of the cloud computing resources. Moreover, by proactively detecting and correcting non-compliant infrastructure before the infrastructure is provisioned in the cloud computing environment, a security of the infrastructure and/or a stability of the infrastructure is improved.
As shown in
The first information may indicate a name of the code repository, a location of the code repository (e.g., a uniform resource locator (URL) for the code repository), a developer team that uses the code repository, an email address for a contact of the developer team, and/or a list of files in the code repository that are to be scanned and/or not scanned by the compliance system. Based on obtaining the first information, the compliance system may add the code repository (e.g., using the first information) to a registry of the compliance system. The registry may indicate code repositories that are to be scanned by the compliance system, as described herein.
In some implementations, the compliance system may obtain the first information responsive to receiving a request (e.g., from a user) to add the code repository to the registry of the compliance system. In some implementations, the compliance system may monitor the repository system for creation of new code repositories, such as a new code repository associated with a name that includes a particular text string (e.g., “infrastructure”) and/or associated with a particular developer team. For example, based on the monitoring, the compliance system may detect creation of the code repository and obtain the first information responsive to detecting creation of the code repository. In some implementations, creation of the code repository may automatically cause the request to be sent to the compliance system (e.g., if the name of the code repository includes the particular text string and/or if the code repository is associated with the particular developer team).
As shown by reference number 110, the compliance system may obtain second information relating to one or more compliance rules for provisioning the infrastructure in the cloud computing environment. That is, the second information may indicate the compliance rule(s). The compliance system may obtain the second information from the rule system. For example, the compliance system may transmit a request (e.g., via an API) for the compliance rule(s) to the rules system, and the compliance system may receive a response from the rules system that indicates the second information relating to the compliance rule(s).
The compliance rule(s) may be defined by the entity, one or more standards bodies, and/or one or more cloud computing providers among other examples. In one example, a compliance rule may indicate that a security group is to be referenced in code using an up-to-date release version for the security group. Additionally, or alternatively, a compliance rule may indicate that a secret key for encryption is not to be included in code. Additionally, or alternatively, a compliance rule may indicate a formatting for a machine image. The foregoing compliance rules are provided as examples, and the one or more compliance rules may include additional and/or different compliance rules than the examples provided herein.
The compliance system may store the compliance rules in a database or another data structure. In some implementations, the compliance rules may be updated (e.g., added and/or deleted) by an administrator of the compliance system. In some implementations, the compliance system may automatically update the compliance rules. For example, the compliance system may monitor (e.g., using natural language processing (NLP)) documents, webpages, and/or other informational sources of the entity, a standards body, and/or a cloud computing provider to identify compliance rules, and based on identifying a compliance rule, the compliance system may automatically update the compliance rules.
As shown in
In some implementations, the scan performed by the compliance system may use NLP to identify a cloud computing provider for the cloud computing environment and/or to identify a programming language used for the file(s). The identity of the cloud computing provider and/or the programming language may facilitate identification of violations of the one or more compliance rules. For example, based on the identity of the cloud computing provider and/or the programming language, the compliance system may determine a parsing scheme (e.g., in accordance with a formatting of the file(s) or a syntax used in the file(s)) for scanning the file(s) to facilitate identification of violations of the one or more compliance rules. As an example, if the compliance system identifies a first cloud computing provider and/or a first programming language, then the compliance system may determine that a first parsing scheme is to be used to scan the file(s), and if the compliance system identifies a second cloud computing provider and/or a second programming language, then the compliance system may determine that a second parsing scheme is to be used to scan the file(s). A parsing scheme may indicate naming conventions, array keys, markup tags, data formats, or the like, for a particular cloud computing provider and/or programming language. Accordingly, the compliance system may perform the scan using the parsing scheme that is determined.
As shown by reference number 120, the compliance system may identify that the content of the file(s) includes at least one violation of the one or more compliance rules (e.g., the content of the file(s) includes non-compliant cloud resource references). For example, the compliance system may identify the at least one violation in connection with the scan of the content of the file(s) and based on the cloud computing provider and/or the programming language. The at least one violation may be in a text string, a line of code, or the like, of the content of the file(s). In some implementations, the at least one violation may be a reference to a security group in the file(s) using an outdated release version of the security group. Additionally, or alternatively, the at least one violation may be inclusion in the file(s) of a secret key for encryption. Additionally, or alternatively, the at least one violation may be incorrect formatting of a machine image of the file(s).
As shown in
To modify the content (e.g., by deleting a portion of the content or changing a portion of the content), the compliance system may create, or cause creation of, a copy of the file(s). For example, the compliance system may create, or cause creation of, a clone of the code repository (e.g., in the repository system) that includes a copy of the file(s). Accordingly, to modify the content, the compliance system may modify the copy of the file(s), as described herein.
In some implementations, the compliance system may determine a probability as to whether a build of the code of the code repository is likely to pass (or fail) if the modified file(s) are used. Whether the build of the code is likely to pass may indicate whether the modified file(s) should be implemented in a production environment. The compliance system may determine the probability as to whether the build of the code is likely to pass by using a machine learning model. The machine learning model may be trained to determine the probability as to whether the build of the code is likely to pass based on historical data indicating whether previous builds of code (e.g., of the one or more files of the code repository and/or of other files in the code repository or in other code repositories) have passed or failed. Accordingly, the compliance system may provide the modified file(s) or the entire code repository to the machine learning model as an input, and the machine learning model may output an indication of the probability as to whether the build of the code is likely to pass. The indication may be, for instance, a score indicating the probability, a percentage of the likelihood, a classification indicating the likelihood (“not likely to pass,” “likely to pass,” etc.), or any other similar notification.
The compliance system may discard the modified file(s) based on a determination that the build of the code of the repository is not likely to pass. In some implementations, the compliance system may determine a recommendation of a modification to the content of the file(s) (e.g., to improve a probability that the build will pass). For example, the compliance system may determine the recommendation of the modification using a machine learning model (e.g., the same machine learning model used to determine whether the build of the code is likely to pass or a different machine learning model). The machine learning model may be trained to determine the modification based on historical data indicating whether previous builds of code (e.g., of the one or more files of the code repository and/or of other files in the code repository or in other code repositories) have passed or failed. Thus, the recommendation of the modification may indicate changes to the content of the file(s) that improves a probability that a build with the file(s) will pass. Based on determining the recommendation of the modification, the compliance system may transmit a notification indicating the recommendation of the modification. The compliance system may transmit the notification to a user device. The user device may be associated with a user (e.g., a developer, a manager, a custodian, or the like) associated with the code repository. In some implementations, the compliance system may automatically implement the recommendation of the modification (e.g., without notifying the user or receiving approval).
As shown in
In some implementations, the compliance system may transmit the request to the repository system, which may cause the repository system to provide a notification of the request to a user (e.g., a developer, a manager, a custodian, or the like) associated with the code repository. Alternatively, the compliance system may transmit the request directly to a user device of the user. In some implementations, the request may include an indication that the request is to be automatically approved (e.g., because the modifications to the content of the files(s) was made by a computer rather than by a human), to thereby cause automatic approval of the request upon receipt by the repository system. In some implementations, rather than transmitting the request, the compliance system may cause the file(s) that have been modified to be automatically merged (e.g., without approval) into the code repository.
As shown by reference number 135, the compliance system may determine a severity of the at least one violation. The compliance system may determine the severity as a severity classification (e.g., “not severe,” “low severity,” “moderate severity,” or “high severity”) and/or a severity score (e.g., on a scale from 1 to 10 or from 1 to 100). In some implementations, the compliance system may determine the severity of the at least one violation using a machine learning model (e.g., a different machine learning model than the machine learning model used to determine whether the build of the code is likely to pass). For example, the machine learning model may be trained to output the severity based on an input indicating the at least one violation and/or the content of the file(s). In some implementations, the machine learning model may be trained using an unsupervised learning technique. Training data used for training the machine learning model may include historical data indicating historical violations, historical corrections of the violations (e.g., by modification of file contents), and historical time lengths for implementing the corrections (e.g., a time length between a first time when a request to merge a corrected file into a code repository is generated and a second time when the merge is implemented).
As shown by reference number 140, the compliance system may transmit a notification indicating the severity of the at least one violation. The notification may also indicate the code repository, the file(s) that were modified, a type of the at least one violation, and/or an identifier of, or a link to, the request to merge the file(s) into the code repository. The compliance system may transmit the notification to a user device. The user device may be associated with a user (e.g., a developer, a manager, a custodian, or the like) of the code repository that may then act upon the request to merge the file(s) into the code repository. In some implementations, the compliance system may determine (e.g., using the machine learning model) a recommendation of a deadline (e.g., a quantity of days or a particular date) by which the request to merge the file(s) is to be acted upon based on the severity of the at least one violation. Here, the notification may also indicate the recommendation of the deadline.
In addition, or as an alternative, to transmitting the notification, the compliance system may perform one or more automatic actions based on the severity of the at least one violation. For example, the compliance system may perform the one or more automatic actions if a severity classification associated with the at least one violation is a particular classification and/or if a severity score associated with the at least one violation satisfies a threshold. In some implementations, an action may include causing deletion of the code repository (e.g., by transmitting a request to delete the code repository to the repository system). For example, the compliance system may cause deletion of the code repository if the request to merge the file(s) has not been acted upon within a particular time period after the request was transmitted. In some implementations, the compliance system may cause deletion of the code repository instead of modifying the content of the file(s) and transmitting the request to merge the file(s). Additionally, or alternatively, an action may include generating (e.g., opening) an incident report relating to the at least one violation (e.g., in an incident management system of the entity). The incident report may indicate similar information to the notification described above. However, the incident report may indicate that the request to merge the file(s) should be acted upon immediately.
The infrastructure may be provisioned in the cloud computing environment using the modified file(s) in the code repository. For example, the infrastructure may be provisioned in the cloud computing environment after the modified file(s) have been merged into the code repository. In some implementations, the compliance system or another system may cause provisioning of the infrastructure in the cloud computing environment using the modified file(s) in the code repository. For example, the compliance system or the other system may provide the file(s), or provide an indication of an infrastructure configuration based on the file(s), to a deployment application that communicates (e.g., via an API) the infrastructure configuration, as defined in the file(s), to a cloud computing provider that is to implement the infrastructure in the cloud computing environment.
By proactively detecting and correcting violations of the compliance rules before the infrastructure is provisioned to the cloud computing environment, a performance of cloud computing resources may be improved. Moreover, by proactively detecting and correcting non-compliant infrastructure before the infrastructure is provisioned in the cloud computing environment, a security of the infrastructure and/or a stability of the infrastructure is improved.
While the foregoing is described in terms of correction of IaC files in a code repository, the techniques described herein may be used in connection with correction of other types of files that may be included in a code repository. For example, other types of files that may be modified to comply with compliance rules may include frontend website files, backend website files, and/or mobile application files, among other examples.
As indicated above,
The compliance system 210 includes one or more devices capable of receiving, generating, storing, processing, providing, and/or routing information associated with correction of non-compliant files in a code repository, as described elsewhere herein. The compliance system 210 may include a communication device and/or a computing device. For example, the compliance system 210 may include a server, such as an application server, a client server, a web server, a database server, a host server, a proxy server, a virtual server (e.g., executing on computing hardware), or a server in a cloud computing system. In some implementations, the compliance system 210 includes computing hardware used in a cloud computing environment.
The repository system 220 includes one or more devices capable of receiving, generating, storing, processing, and/or providing information associated with one or more code repositories, as described elsewhere herein. The repository system 220 may include a communication device and/or a computing device. For example, the repository system 220 may include a server, such as an application server, a client server, a web server, a database server, a host server, a proxy server, a virtual server (e.g., executing on computing hardware), or a server in a cloud computing system. In some implementations, the repository system 220 includes computing hardware used in a cloud computing environment.
The rules system 230 includes one or more devices capable of receiving, generating, storing, processing, and/or providing information associated with compliance rules, as described elsewhere herein. The rules system 230 may include a communication device and/or a computing device. For example, the rules system 230 may include a server, such as an application server, a client server, a web server, a database server, a host server, a proxy server, a virtual server (e.g., executing on computing hardware), or a server in a cloud computing system. In some implementations, the rules system 230 includes computing hardware used in a cloud computing environment.
The user device 240 includes one or more devices capable of receiving, generating, storing, processing, and/or providing information associated with correction of non-compliant files in a code repository, as described elsewhere herein. The user device 240 may include a communication device and/or a computing device. For example, the user device 240 may include a wireless communication device, a mobile phone, a user equipment, a laptop computer, a tablet computer, a desktop computer, a gaming console, a set-top box, a wearable communication device (e.g., a smart wristwatch, a pair of smart eyeglasses, a head mounted display, or a virtual reality headset), or a similar type of device.
The cloud computing system 250 includes one or more devices capable of receiving, generating, storing, processing, and/or providing (e.g., deploying) cloud computing services, as described elsewhere herein. The cloud computing system 250 may include a communication device and/or a computing device. For example, the cloud computing system 250 may include a server, an application server, a client server, a web server, a database server, a host server, a proxy server, a virtual server (e. g., executing on computing hardware), a device that includes computing hardware used in a cloud computing environment, or a similar type of device. The cloud computing system 250 may communicate with one or more other devices of environment 200, as described elsewhere herein.
The network 260 includes one or more wired and/or wireless networks. For example, the network 260 may include a wireless wide area network (e.g., a cellular network or a public land mobile network), a local area network (e.g., a wired local area network or a wireless local area network (WLAN), such as a Wi-Fi network), a personal area network (e.g., a Bluetooth network), a near-field communication network, a telephone network, a private network, the Internet, and/or a combination of these or other types of networks. The network 260 enables communication among the devices of environment 200.
The number and arrangement of devices and networks shown in
Bus 310 includes one or more components that enable wired and/or wireless communication among the components of device 300. Bus 310 may couple together two or more components of
Memory 330 includes volatile and/or nonvolatile memory. For example, memory 330 may include random access memory (RAM), read only memory (ROM), a hard disk drive, and/or another type of memory (e.g., a flash memory, a magnetic memory, and/or an optical memory). Memory 330 may include internal memory (e.g., RAM, ROM, or a hard disk drive) and/or removable memory (e.g., removable via a universal serial bus connection). Memory 330 may be a non-transitory computer-readable medium. Memory 330 stores information, instructions, and/or software (e.g., one or more software applications) related to the operation of device 300. In some implementations, memory 330 includes one or more memories that are coupled to one or more processors (e.g., processor 320), such as via bus 310.
Input component 340 enables device 300 to receive input, such as user input and/or sensed input. For example, input component 340 may include a touch screen, a keyboard, a keypad, a mouse, a button, a microphone, a switch, a sensor, a global positioning system sensor, an accelerometer, a gyroscope, and/or an actuator. Output component 350 enables device 300 to provide output, such as via a display, a speaker, and/or a light-emitting diode. Communication component 360 enables device 300 to communicate with other devices via a wired connection and/or a wireless connection. For example, communication component 360 may include a receiver, a transmitter, a transceiver, a modem, a network interface card, and/or an antenna.
Device 300 may perform one or more operations or processes described herein. For example, a non-transitory computer-readable medium (e.g., memory 330) may store a set of instructions (e.g., one or more instructions or code) for execution by processor 320. Processor 320 may execute the set of instructions to perform one or more operations or processes described herein. In some implementations, execution of the set of instructions, by one or more processors 320, causes the one or more processors 320 and/or the device 300 to perform one or more operations or processes described herein. In some implementations, hardwired circuitry is used instead of or in combination with the instructions to perform one or more operations or processes described herein. Additionally, or alternatively, processor 320 may be configured to perform one or more operations or processes described herein. Thus, implementations described herein are not limited to any specific combination of hardware circuitry and software.
The number and arrangement of components shown in
As shown in
As further shown in
As further shown in
As further shown in
Although
The foregoing disclosure provides illustration and description, but is not intended to be exhaustive or to limit the implementations to the precise forms disclosed. Modifications may be made in light of the above disclosure or may be acquired from practice of the implementations.
As used herein, the term “component” is intended to be broadly construed as hardware, firmware, or a combination of hardware and software. It will be apparent that systems and/or methods described herein may be implemented in different forms of hardware, firmware, and/or a combination of hardware and software. The actual specialized control hardware or software code used to implement these systems and/or methods is not limiting of the implementations. Thus, the operation and behavior of the systems and/or methods are described herein without reference to specific software code—it being understood that software and hardware can be used to implement the systems and/or methods based on the description herein.
As used herein, satisfying a threshold may, depending on the context, refer to a value being greater than the threshold, greater than or equal to the threshold, less than the threshold, less than or equal to the threshold, equal to the threshold, not equal to the threshold, or the like.
Although particular combinations of features are recited in the claims and/or disclosed in the specification, these combinations are not intended to limit the disclosure of various implementations. In fact, many of these features may be combined in ways not specifically recited in the claims and/or disclosed in the specification. Although each dependent claim listed below may directly depend on only one claim, the disclosure of various implementations includes each dependent claim in combination with every other claim in the claim set. As used herein, a phrase referring to “at least one of” a list of items refers to any combination and permutation of those items, including single members. As an example, “at least one of: a, b, or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c, as well as any combination with multiple of the same item. As used herein, the term “and/or” used to connect items in a list refers to any combination and any permutation of those items, including single members (e.g., an individual item in the list). As an example, “a, b, and/or c” is intended to cover a, b, c, a-b, a-c, b-c, and a-b-c.
No element, act, or instruction used herein should be construed as critical or essential unless explicitly described as such. Also, as used herein, the articles “a” and “an” are intended to include one or more items, and may be used interchangeably with “one or more.” Further, as used herein, the article “the” is intended to include one or more items referenced in connection with the article “the” and may be used interchangeably with “the one or more.” Furthermore, as used herein, the term “set” is intended to include one or more items (e.g., related items, unrelated items, or a combination of related and unrelated items), and may be used interchangeably with “one or more.” Where only one item is intended, the phrase “only one” or similar language is used. Also, as used herein, the terms “has,” “have,” “having,” or the like are intended to be open-ended terms. Further, the phrase “based on” is intended to mean “based, at least in part, on” unless explicitly stated otherwise. Also, as used herein, the term “or” is intended to be inclusive when used in a series and may be used interchangeably with “and/or,” unless explicitly stated otherwise (e.g., if used in combination with “either” or “only one of”).