Embodiments described herein relate to a secure data enclave, and more particularly, to a secure data enclave for secure model development.
Disparate and amalgamated systems that are traditionally used by teams to build data models often result in insufficient data protection, access restrictions, and reliability.
To solve these and other problems, embodiments described herein provide methods and systems for secure model development using, for example, a secure data enclave. The secure data enclave assists with the reliable development and management of data models at a global scale. The secure data enclave provides a development environment, a quality assurance environment, and a production environment for the purpose of providing artificial intelligence and/or machine learning models. The secure data enclave may also provide secure and private access to the resulting (derived) data. The secure data enclave may use a tiered application pattern such that software solutions may leverage artificial intelligence and machine learning models through a mechanism that is global-scale, self-healing, and auto-scaling with enhanced availability.
One embodiment provides a system for secure model development. The system includes an electronic processor configured to receive, within a data quality assurance environment, a user input from a user device. The electronic processor is also configured to access a code artifact stored in a code artifact repository from a data development environment based on the user input. The electronic processor is also configured to access a set of data stored in a database from a data production environment based on the user input. The electronic processor is also configured to download a copy of the set of data without changing the set of data stored in the database. The electronic processor is also configured to train, within the data quality assurance environment, a model using machine learning based on the code artifact and the copy of the set of data. The electronic processor is also configured to transmit the model to a model database.
Another embodiment provides a method for secure model development. The method includes receiving, within a data development environment with an electronic processor, a code input from a user device. The method also includes developing, within the data development environment with the electronic processor, a code artifact based on the code input. The method also includes storing, with the electronic processor, the code artifact in a code artifact repository of the data development environment. The method also includes accessing, with a data quality assurance environment with the electronic processor, at least one code artifact stored in the code artifact repository from the data development environment. The method also includes downloading, within the data quality assurance environment with the electronic processor, a copy of data from a database of a data production environment without changing the data from the database. The method also includes training, within the data quality assurance environment with the electronic processor, a model using machine learning based on the at least one code artifact and the copy of data. The method also includes transmitting, with the electronic processor, the model to a model database for storage.
Yet another embodiment provides a non-transitory, computer-readable medium storing instructions that, when executed by an electronic processor, perform a set of functions. The set of functions includes receiving, within a data quality assurance environment, a user input from a user device. The set of functions also includes accessing, with the data quality assurance environment, a first code artifact stored in a first code artifact repository from a first data development environment based on the user input. The set of functions also includes accessing, with the data quality assurance environment, a second code artifact stored in a second code artifact repository from a second data development environment based on the user input. The set of functions also includes downloading, with the data quality assurance environment, a copy of data stored in a database from a data production environment based on the user input without changing the data stored in the database. The set of functions also includes training, within the data quality assurance environment, a model using machine learning based on the first code artifact, the second code artifact, and the data. The set of functions also includes transmitting the model to a model database for storage.
Other aspects of the embodiments described herein will become apparent by consideration of the detailed description and accompanying drawings.
Other aspects of the embodiments described herein will become apparent by consideration of the detailed description.
Before embodiments of the invention are explained in detail, it is to be understood that the invention is not limited in its application to the details of construction and the arrangement of components set forth in the following description or illustrated in the accompanying drawings. The invention is capable of other embodiments and of being practiced or of being carried out in various ways. That is, the embodiments described herein illustrate possible implementations of the invention and are not to be interpreted as a comprehensive list of implementations.
Also, it is to be understood that the phraseology and terminology used herein is for the purpose of description and should not be regarded as limiting. The use of “including,” “comprising” or “having” and variations thereof herein is meant to encompass the items listed thereafter and equivalents thereof as well as additional items. The terms “mounted,” “connected” and “coupled” are used broadly and encompass both direct and indirect mounting, connecting and coupling. Further, “connected” and “coupled” are not restricted to physical or mechanical connections or couplings, and may include electrical connections or couplings, whether direct or indirect. Also, electronic communications and notifications may be performed using any known means including direct connections, wireless connections, etc.
A plurality of hardware and software based devices, as well as a plurality of different structural components may be utilized to implement the embodiments described herein. In addition, embodiments described herein may include hardware, software, and electronic components or modules that, for purposes of discussion, may be illustrated and described as if the majority of the components were implemented solely in hardware. However, one of ordinary skill in the art, and based on a reading of this detailed description, would recognize that, in at least one embodiment, the electronic-based aspects of the embodiments described herein may be implemented in software (for example, stored on non-transitory computer-readable medium) executable by one or more processors. For example, “computing device,” and “server” as described in the specification may include one or more electronic processors, one or more memory modules including non-transitory computer-readable medium, one or more input/output interfaces, and various connections (for example, a system bus) connecting the components.
In the embodiment shown, the server 110 and the user devices 105 are communicatively coupled via a communication network 130. The communication network 130 is an electronic communications network including wireless and wired connections. Portions of the communication network 130 may be implemented using a wide area network, such as the Internet, a local area network, such as a Bluetooth™ network or Wi-Fi, and combinations or derivatives thereof. Alternatively or in addition, in some embodiments, components of the system 100 communicate directly with each other as compared to communicating through the communication network 130. Also, in some embodiments, the components of the system 100 communicate through one or more intermediary devices not illustrated in
The server 110 may be a computing device, which may provide or function as a secure data enclave for securely developing models, such as artificial intelligence models or machine learning models. In the embodiment shown in
The electronic processor 200 may include a microprocessor, an application-specific integrated circuit (ASIC), or another suitable electronic device for processing data. The memory 205 may include a non-transitory computer-readable medium, such as read-only memory (“ROM”), random access memory (“RAM”) (for example, dynamic RAM (“DRAM”), synchronous DRAM (“SDRAM”), and the like), electrically erasable programmable read-only memory (“EEPROM”), flash memory, a hard disk, a secure digital (“SD”) card, another suitable memory device, or a combination thereof. The electronic processor 200 is configured to access and execute computer-readable instructions (“software”) stored in the memory 205. The software may include firmware, one or more applications, program data, filters, rules, one or more program modules, and other executable instructions. For example, the software may include instructions and associated data for performing a set of functions, including the methods described herein.
In the embodiment shown, the communication interface 210 allows the server 110 to communicate with devices external to the server 110. For example, as illustrated in
The user device 105 may also be a computing device and may include a desktop computer, a terminal, a workstation, a laptop computer, a tablet computer, a smart watch or other wearable, a smart television or whiteboard, or the like. Although not illustrated, the user device 105 may include similar components as the server 110 (an electronic processor, a memory, and a communication interface). The user device 105 may also include a human-machine interface. The human-machine interface may include one or more input devices, one or more output devices, or a combination thereof. Accordingly, in some embodiments, the human-machine interface allows a user to interact with (for example, provide input to and receive output from) the user device 105. For example, the human-machine interface may include a keyboard, a cursor-control device (for example, a mouse), a touch screen, a scroll ball, a mechanical button, a display device (for example, a liquid crystal display (“LCD”)), a printer, a speaker, a microphone, or a combination thereof.
A user (for example, a data scientist, a data analyst, a data engineer, and the like) may use the user device 105 to develop an artificial intelligence model, a machine learning model, another type of model, or a combination thereof. For example, a user may access the secure data enclave (through a browser application or a dedicated application stored on the user device 105 that communicates with the server 110) and interact with the secure data enclave (i.e., one or more environments provided by the secure data enclave) via the human-machine interface associated with the user device 105. In some embodiments, a user may use the user device 105 to interact with the secure data enclave (for example, a data development environment provided by the secure data enclave) to write code for artificial intelligence and/or machine learning activities, publish a code artifact for use in training a machine learning model, and the like. Alternatively or in addition, in some embodiments, a user may use the user device 105 to interact with the secure data enclave (for example, a data quality assurance environment provided by the secure data enclave) to leverage one or more published code artifacts for exploration, model development, and the like. Alternatively or in addition, in some embodiments, a user may use the user device 105 to interact with the secure data enclave (for example, a data quality assurance environment provided by the secure data enclave) to export or transmit a developed model.
In some embodiments, the secure data enclave provides (or includes) multiple environments. Each environment may include controls designed for high velocity data development teams while preserving the security of (derived) customer data. For example,
In some embodiments, the secure data enclave may include additional, fewer, or different environments than illustrated in
As noted above, in some embodiments, the functionality (or a portion thereof) of the server 110 may be distributed among multiple devices or servers. Accordingly, in some embodiments, the system 100 includes multiple servers 110, where each server 110 provides an environment described herein as being provided by the server 110. For example, the data development environment 305 may be provided by a first server (for example, a data development server), the data quality assurance environment 310 may be provided by a second server (for example, a data quality assurance server), and the data production environment 315 may be provided by a third server (for example, a data production server). In such embodiments, multiple servers may communicate directly with each other over one or more wired communication lines or buses. Additionally, in some embodiments, the data development environment 305, the data quality assurance environment 310, the data production environment 315, or a combination thereof may include additional, fewer, or different components than illustrated in
The data development environment 305 is an environment where a user, such as a data scientist or a data engineer, may write code and develop new machine learning or artificial intelligence solutions (i.e., models) programmatically without exposing real customer data. In other words, in some embodiments, the data development environment 305 does not have access to customer data (for example, production product data). Rather, the data development environment 305 enables (via the electronic processor 200) a user to write code for artificial intelligence and/or machine learning activities. Accordingly, the data development environment 305 is an environment for developing libraries and frameworks using application programming languages, such as python or java.
In the embodiment shown in
The data quality assurance environment 310 may be an environment for data exploration and model development. In some embodiments, the data quality assurance environment 310 is a controlled environment that does not have Internet access. As seen in
In the embodiment shown in
As illustrated in
As illustrated in
After receiving the code input from the user device 105 (at block 405), the electronic processor 200, within the data development environment 305, develops a code artifact based on the code input (at block 410). As noted above, the data development environment 305 (via the electronic processor 200) may use the written code (i.e., the code input) to develop a new machine learning or artificial intelligence solution. Accordingly, in response to the user device 105 interfacing with the code repository 320, a build system or process (for example, the build and test pipeline 335) of the data development environment 305 is triggered. In some embodiments, the build and test pipeline 335 of the data development environment 305 verifies that the code meets established guidelines. For example, a user may use one or more code review processes (executed by the electronic processor 200) when writing code within the data development environment 305. A code review process may include, for example, a code security scan, a code vulnerability scan, and the like. The code review process may be included as part of the build and test pipeline 335, the build task 340, or a combination thereof. Accordingly, as seen in
In the embodiment shown, the electronic processor 200 stores (or publishes) the code artifact in the code artifact repository 345 of the data development environment 305 (at block 415). As noted above, a code artifact of the data development environment 305 may be published (or stored) to the code artifact repository 345. In some embodiments, the code artifact repository 345 accepts code artifacts directly from a user (via the user device 105). Alternatively or in addition, in some embodiments, the code artifact repository 345 does not accept code artifacts directly from a user (via the user device 105). In such embodiments, the code artifact repository 345 only accepts verified code artifacts from the build system or process (for example, the build and test pipeline 335).
As noted above, a user (via the user device 105) may interface with the data quality assurance environment 310 through the application 380. Accordingly, in some embodiments, the electronic processor 200 receives, within the data quality assurance environment 310, a user input from the user device 105 through the application 380. The user input may be associated with a data exploration function, a model development function, or a combination thereof. In other words, the electronic processor 200 may use the user input to perform a data exploration function, a model development function, or a combination thereof, such as developing a model.
In the embodiment shown in
In some embodiments, the electronic processor 200 implements one or more data tools 605 as part of the data quality assurance environment 310, as seen in
In some embodiments, the electronic processor 200 trains (or develops), within the data quality assurance environment 310, a model using machine learning based on the code artifact and the data (at block 430). In other words, the electronic processor 200 trains the model using one or more machine learning functions based on the code artifact and the data. Machine learning functions are generally functions that allow a computer application to learn without being explicitly programmed. Machine learning performed by the electronic processor 200 may be performed using various types of methods and mechanisms including but not limited to decision tree learning, association rule learning, artificial neural networks, inductive logic programming, support vector machines, clustering, Bayesian networks, reinforcement learning, representation learning, similarity and metric learning, sparse dictionary learning, and genetic algorithms. These approaches allow the electronic processor 200 to ingest, parse, and understand data and progressively refine models for data analytics.
In some embodiments, the electronic processor 200 performs a data training workflow 610, as illustrated in
In the embodiment shown, after a model is trained, the electronic processor 200 transmits (or exports) the model to the model database 397 for storage (at block 435). As noted above, in some embodiments, the data production environment 315 is not directly accessible by a user. For example, the data production environment 315 may be an isolated environment that enables data-powered applications for an application development team or user. In such embodiments, the data production environment 315 may be a development environment, a quality assurance environment, a production environment, or a combination thereof for an application programming interface (API) solution, as illustrated in
Alternatively or in addition, in some embodiments, the data quality assurance environment 310 hosts the model database 397 (as a hosted model database), as illustrated in
In some embodiments, the secure data enclave includes (or has access to) multiple data production environments 315. In such embodiments, the electronic processor 200 may generate, train, and develop new models and test the new models in one or more champion versus challenger experiments (represented in
Accordingly, in such embodiments, the data quality assurance environment 310 (via the electronic processor 200) may host machine learning or artificial intelligence models and provide the models through a private connection (for example, the private connection point 805) to product or solution environments (for example, the data production environment 315). The product or solution environments may consume the hosted models through a secure application programming interface.
Alternatively or in addition, as seen in
In some embodiments, as noted above, the secure data enclave includes multiple data development environments 305. In such embodiments, the electronic processor 200 may be configured to receive multiple code artifacts from multiple data development environments 305. For example, the electronic processor 200 may access a first code artifact stored in a first code artifact repository from a first data development environment and access a second code artifact stored in a second code artifact repository from a second data development environment based on the user input. In response to accessing the first code artifact and the second code artifact, the electronic processor 200 may train the model (within the data quality assurance environment 310) using machine learning based on the first code artifact, the second code artifact, and the data from the production database 390.
Thus, the embodiments described herein provide, among other things, methods and systems for secure model development using, for example, a secure data enclave.
This application claims priority to U.S. Provisional Patent Application No. 63/168,573 filed Mar. 31, 2021, the entire contents of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63168573 | Mar 2021 | US |