The present invention relates to continuous integration and continuous delivery (CI/CD) software development tools, and more particularly, is related to in-service software updates.
Availability of computer programs is essential to many businesses. A business may lose valuable time during software updates for mission critical programs. For example, large, distributed software applications may include multiple processes that may each have their own development schedules, which can lead to frequent outages while the various processes are updated. Further, dependencies across components may be complex, for example, if updating a first process requires additional updates to upstream and/or downstream processes. Typically, updating of any one component in a multi-component system results in downtime for the entire system.
For example, a developer implements a change by making a working copy (“branch”) of the current code base, updating the code in a snapshot of the code base at one moment in time. As other developers submit changed code to the source code repository, this snapshot ceases to reflect the (live) repository code. As the existing code base changes, new code may also be added, as well as new libraries, and other resources that create dependencies, and potential conflicts. The longer development continues on a branch without merging back to the mainline, the greater the risk of multiple integration conflicts and failures when the developer branch is eventually merged back. When developers submit code to the repository, they must first update their code to reflect the changes in the repository since they took their copy. The more changes the repository contains, the more work developers must do before submitting their own changes.
Therefore, there is a need in the industry to address the abovementioned shortcomings.
The present system and method provides a dataflow pipeline deployment system and method. A computer based software development system is configured to deploy a plurality of package environments, comprising: a processor and storage device configured to store non-transient instructions that when implemented by the processor comprise: a dataflow pipeline deployer (110); a Git repository (120) configured to store a flow definition and an environment specific parameter; a secret store (125) configured to store an API key/secret and data pipeline credentials; and for each environment of the plurality of environments: a pipeline flow registry (140, 170) configured to store a static definition (151, 181) of an environment specific production dataflow pipeline (152, 182); a dataflow pipeline container (150, 180) configured to contain the production dataflow pipeline and a store (154, 184) of environment specific secrets and parameters.
A computer-based method for managing a dataflow pipeline across a plurality of environments, is also provided, comprising the steps of: receiving, for a first environment, a check-in comprising source code, a task definition, a configuration, and a flow definition; receiving a definition of an application secret and an associated credential; storing the application secret and credential in a dataflow pipeline deployer data store; creating a data flow pipeline package, further comprising the steps of: merging the source code, the configuration, and the secrets; and defining a pipeline graph in the first environment pipeline flow registry.
Other systems, methods and features of the present invention will be or become apparent to one having ordinary skill in the art upon examining the following drawings and detailed description. It is intended that all such additional systems, methods, and features be included in this description, be within the scope of the present invention and protected by the accompanying claims.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present invention. The drawings illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
The following definitions are useful for interpreting terms applied to features of the embodiments disclosed herein and are meant only to define elements within the disclosure.
As used within this disclosure, “continuous integration” (CI) is a software engineering term referring to the practice of regularly merging working copies of all developers into a shared mainline, for example, several times a day. CI is often intertwined with continuous delivery (CD) or continuous deployment in a CI/CD pipeline. “Continuous delivery” ensures the software checked in on the mainline is always in a state that can be deployed to users and “continuous deployment” makes the deployment process fully automated.
A CI/CD pipeline automates the software delivery process. The pipeline builds code, runs tests (CI), and safely deploys a new version of the application (CD). Automated pipelines remove manual errors, provide standardized feedback loops to developers, and enable fast product iterations.
As used within this disclosure, a “pipeline” (or “Git pipeline,” for instance an Apache Beam Pipeline) refers to an extensible set of tools for modeling build, testing and deploying code. Pipelines are a top-level component of continuous integration, delivery, and deployment. Pipelines include jobs and stages. Pipeline jobs define what to do, for example, compile or test code. Pipeline stages, define when to run the jobs. For example, a pipeline may have four stages, executed in the following order:
Jobs are executed by runners. Multiple jobs in the same stage are executed in parallel if there are enough concurrent runners. If all jobs in a stage succeed, the pipeline moves on to the next stage. If any job in a stage fails, the next stage is typically not executed, and the pipeline ends early.
As used within this disclosure, “Git” refers to an open-source software used for tracking project changes and revisions across different development teams. Git saves different versions of projects in a folder known as a Git repository (“Git repo”). A Git repository tracks and saves the history of all changes made to the files in a Git project. The Git repository saves this data in a directory called .git, also known as the repository folder. Git uses a version control system to track all changes made to the project (including source code) and saves them in the repository.
As used within this disclosure, a “dataflow” refers to a template configured to allow development teams to share pipelines with team members and across their organization. A dataflow may implement one or more data processing tasks.
As used within this disclosure a “secret” generally refers to secure/sensitive data to be accessed used while processing a pipeline. A secret generally requires some sort of token (e.g., username and password, decryption key, etc.) for access. Such a token may be provided via a resolving entity.
As used within this disclosure, an “environment” refers to one of several spaces where related software systems are maintained. For example, a development environment where software is initially developed, a Quality Assurance (QA) environment where the software is tested in the context of a system before deployment, and a production (prod) environment where the released system operates.
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
Embodiments of the present invention are directed towards a dataflow pipeline deployer for real-time modification of software while the software is being run. The system determines what modules of the software are being utilized in the current running of the application and will be utilized in the immediate future, based on the process that is running and associated processes that need to run for the process to be complete. The software only modifies the modules that are not being used currently or will need to be used in the near future. Such modifications may include selective updating of software modules.
The dataflow pipeline deployer manages source code in a Git repo, secrets in secret stores for each of a plurality of environments, a sequencer and specifics for each environment, and a pipeline registry which has a static definition of the pipeline.
In the production environment 130, a production pipeline container 150 contains a production pipeline 152 and a store of secrets and parameters 154. The production pipeline 152 includes a plurality of components (shown as rectangles with solid lines) and flows between the components (shown as solid arrows). A production pipeline flow registry 140 has a static definition 151 of the production dataflow pipeline 152. The dashed boxes represent static definitions of the components of the production dataflow pipeline 152.
The QA environment 160 has a similar structure to the production environment. In the QA environment 160, a QA pipeline container 180 contains a QA pipeline 182 and a store of secrets and parameters 184. The QA pipeline 182 includes a plurality of components (shown as rectangles with solid lines) and flows between the components (shown as solid arrows). A QA pipeline flow registry 170 has a static definition 181 of the QA dataflow pipeline 182. The dashed boxes represent static definitions of the components of the QA dataflow pipeline 182.
The dataflow pipeline manager 110 performs several functions, including (but not limited to):
For example, under the first embodiment, a component first version v1 of the production pipeline 152 in the production pipeline container 150 is to be updated by a component second version v2 in the production pipeline flow registry 140. A third version v3 of the component is in the QA pipeline flow registry 170. Once the third version v3 of the component is validated in the QA environment 160, the third version v3 may be promoted to the production environment 150, along with any associated secrets and parameters from the QA dataflow pipeline container 180.
It is desirable for the update to only replace components in the production environment 130 that are affected by the change, thereby only stopping associated portions of the production pipeline 152 while leaving other components operations. Likewise, if the update only involves changing a secret or a parameter, only portions of the production pipeline 152 affected by the change of secret/parameter are stopped. Here, the deployer 110 combines the source code, the secrets/parameters, and the flow definitions.
Under the present embodiments, the definition of a pipeline differs from a traditional Git pipeline. A traditional Git pipeline store source code and run-time definitions. Under the first embodiment the Git pipeline stores source code, dataflow, and secrets as parameters. In a traditional Git pipeline versions are promoted from environment to environment and eventually published. Under the dataflow pipeline embodiments, other components are orchestrated along with source code promotion.
For example, a component to be paused may receive input data from an upstream queue, and provide output data to a downstream queue. In order to pause the component, the upstream queue must similarly be paused while the component is updating, so that data from the upstream queue is not lost during the update.
During the update/promotions, only specifically identified components are paused, refreshed, and then un-paused.
After the updated version is verified in the first environment, the dataflow pipeline deployer promotes the updated version to a second environment, for example, a staging environment or a production environment, as shown by the flowchart 301 of
After the updated version is verified in the first environment, the dataflow pipeline deployer promotes the updated version to a second environment, for example, a staging environment or a production environment, as shown by the flowchart 401 of
As shown by
The present system for executing the functionality described in detail above may be a computer, an example of which is shown in the schematic diagram of
The processor 502 is a hardware device for executing software, particularly that stored in the memory 506. The processor 502 can be any custom made or commercially available single core or multi-core processor, a central processing unit (CPU), an auxiliary processor among several processors associated with the present system 500, a semiconductor based microprocessor (in the form of a microchip or chip set), a macroprocessor, or generally any device for executing software instructions.
The memory 506 can include any one or combination of volatile memory elements (e.g., random access memory (RAM, such as DRAM, SRAM, SDRAM, etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape, CDROM, etc.). Moreover, the memory 506 may incorporate electronic, magnetic, optical, and/or other types of storage media. Note that the memory 506 can have a distributed architecture, where various components are situated remotely from one another, but can be accessed by the processor 502.
The software 508 defines functionality performed by the system 500, in accordance with the present invention. The software 508 in the memory 506 may include one or more separate programs, each of which contains an ordered listing of executable instructions for implementing logical functions of the system 500, as described below. The memory 506 may contain an operating system (O/S) 520. The operating system essentially controls the execution of programs within the system 500 and provides scheduling, input-output control, file and data management, memory management, and communication control and related services.
The I/O devices 510 may include input devices, for example but not limited to, a keyboard, mouse, scanner, microphone, etc. Furthermore, the I/O devices 510 may also include output devices, for example but not limited to, a printer, display, etc. Finally, the I/O devices 510 may further include devices that communicate via both inputs and outputs, for instance but not limited to, a modulator/demodulator (modem; for accessing another device, system, or network), a radio frequency (RF) or other transceiver, a telephonic interface, a bridge, a router, or other device.
When the system 500 is in operation, the processor 502 is configured to execute the software 508 stored within the memory 506, to communicate data to and from the memory 506, and to generally control operations of the system 500 pursuant to the software 508, as explained above.
When the functionality of the system 500 is in operation, the processor 502 is configured to execute the software 508 stored within the memory 506, to communicate data to and from the memory 506, and to generally control operations of the system 500 pursuant to the software 508. The operating system 520 is read by the processor 502, perhaps buffered within the processor 502, and then executed.
When the system 500 is implemented in software 508, it should be noted that instructions for implementing the system 500 can be stored on any computer-readable medium for use by or in connection with any computer-related device, system, or method. Such a computer-readable medium may, in some embodiments, correspond to either or both the memory 506 or the storage device 504. In the context of this document, a computer-readable medium is an electronic, magnetic, optical, or other physical device or means that can contain or store a computer program for use by or in connection with a computer-related device, system, or method. Instructions for implementing the system can be embodied in any computer-readable medium for use by or in connection with the processor or other such instruction execution system, apparatus, or device. Although the processor 502 has been mentioned by way of example, such instruction execution system, apparatus, or device may, in some embodiments, be any computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. In the context of this document, a “computer-readable medium” can be any means that can store, communicate, propagate, or transport the program for use by or in connection with the processor or other such instruction execution system, apparatus, or device.
Such a computer-readable medium can be, for example but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. More specific examples (a nonexhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic) having one or more wires, a portable computer diskette (magnetic), a random access memory (RAM) (electronic), a read-only memory (ROM) (electronic), an erasable programmable read-only memory (EPROM, EEPROM, or Flash memory) (electronic), an optical fiber (optical), and a portable compact disc read-only memory (CDROM) (optical). Note that the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted, or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
In an alternative embodiment, where the system 500 is implemented in hardware, the system 500 can be implemented with any or a combination of the following technologies, which are each well known in the art: a discrete logic circuit(s) having logic gates for implementing logic functions upon data signals, an application specific integrated circuit (ASIC) having appropriate combinational logic gates, a programmable gate array(s) (PGA), a field programmable gate array (FPGA), etc.
It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present invention without departing from the scope or spirit of the invention. In view of the foregoing, it is intended that the present invention cover modifications and variations of this invention provided they fall within the scope of the following claims and their equivalents.
The present application claims priority to U.S. provisional patent application No. 63/369,664, entitled “Dataflow Pipeline Deployment Method”, filed on Jul. 28, 2022, which is incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63369664 | Jul 2022 | US |