This disclosure is generally directed to machine learning systems. More specifically, this disclosure is directed to a system and method for machine learning pipeline generation and management.
A machine learning pipeline is a software system that provides a way to compose and execute multiple data processing and machine learning steps in an ordered sequence. Each step may take in one or more data inputs and return one or more data outputs. A machine learning pipeline is often constructed to perform one or more machine learning operations. Typical operations of the pipeline can include training one or more machine learning steps, automatically tuning training parameters of the machine learning steps (such as during hyperparameter optimization), predicting new data given a pipeline containing one or more trained machine learning steps, measuring (scoring) the performance of the prediction results, or interpreting the contribution level of different input data to the prediction results. The base machine learning pipeline functionality may be extended to include additional operations or to change the logic of existing operations.
This disclosure relates to a system and method for machine learning pipeline generation and management.
In a first embodiment, a method includes generating an authoring representation of a machine learning pipeline based on a received input, where the authoring representation is configured to manage one or more machine learning operations. The method also includes receiving an indication of an operation to be performed on the authoring representation. The method further includes translating the authoring representation to an intermediate representation based on the operation and optimizing the intermediate representation. In addition, the method includes translating the intermediate representation to an execution representation that is understood by one or more machine learning executors.
In a second embodiment, an apparatus includes at least one processing device configured to generate an authoring representation of a machine learning pipeline based on a received input, where the authoring representation is configured to manage one or more machine learning operations. The at least one processing device is also configured to receive an indication of an operation to be performed on the authoring representation. The at least one processing device is further configured to translate the authoring representation to an intermediate representation based on the operation and optimize the intermediate representation. In addition, the at least one processing device is configured to translate the intermediate representation to an execution representation that is understood by one or more machine learning executors.
In a third embodiment, a non-transitory computer readable medium stores computer readable program code that, when executed by one or more processors, causes the one or more processors to generate an authoring representation of a machine learning pipeline based on a received input, where the authoring representation is configured to manage one or more machine learning operations. The non-transitory computer readable medium also stores computer readable program code that, when executed by the one or more processors, causes the one or more processors to receive an indication of an operation to be performed on the authoring representation. The non-transitory computer readable medium further stores computer readable program code that, when executed by the one or more processors, causes the one or more processors to translate the authoring representation to an intermediate representation based on the operation and optimize the intermediate representation. In addition, the non-transitory computer readable medium stores computer readable program code that, when executed by the one or more processors, causes the one or more processors to translate the intermediate representation to an execution representation that is understood by one or more machine learning executors.
Other technical features may be readily apparent to one skilled in the art from the following figures, descriptions, and claims.
For a more complete understanding of this disclosure, reference is now made to the following description, taken in conjunction with the accompanying drawings, in which:
As noted above, a machine learning pipeline is a software system that provides a way to compose and execute multiple data processing and machine learning steps in an ordered sequence. Each step may take in one or more data inputs and return one or more data outputs. A machine learning pipeline is often constructed to perform one or more machine learning operations. Typical operations of the pipeline can include training one or more machine learning steps, automatically tuning training parameters of the machine learning steps (such as during hyperparameter optimization), predicting new data given a pipeline containing one or more trained machine learning steps, measuring (scoring) the performance of the prediction results, or interpreting the contribution level of different input data to the prediction results. The base machine learning pipeline functionality may be extended to include additional operations or to change the logic of existing operations.
In some systems, a machine learning pipeline's “authored representation” represents the sequence of steps constructed by a user and defines the pipeline, where the authored representation is specific to one or more machine learning operations. For example, a machine learning pipeline using some systems may require a training step that is independent of a prediction step. In addition, some systems do not include a directed acyclic graph (DAG) topology for authoring or execution graphs.
This disclosure provides an apparatus, method, and computer readable medium supporting a process for machine learning pipeline generation and management. The disclosed embodiments allow multiple operations to be performed on a single pipeline representation. This allows for a single user-defined representation (the “authoring representation”) of a machine learning pipeline, so the user does not have to maintain separate pipelines for each operation. Stated differently, a user (such as a data scientist) can author a machine learning pipeline only once, unifying training, prediction, scoring, tuning, and interpreting operations without having to repeat any of the operations. Here, the authoring representation can manage each operation described without requiring operation-specific steps or different user-defined pipeline architectures corresponding to each operation.
The disclosed authoring representation allows for static typing of the inputs and outputs of each step, which can be used for validating that the steps are connected properly. The authoring representation also allows for clearly defining the input/output signatures of the machine learning operations in the pipeline, thus creating a well-defined interface for connecting to an external software system. In addition, the disclosed authoring representation supports a directed acyclic graph (DAG) topology as described in greater detail below.
The network 104 facilitates communication between various components of the system 100. For example, the network 104 may communicate Internet Protocol (IP) packets, frame relay frames, Asynchronous Transfer Mode (ATM) cells, or other suitable information between network addresses. The network 104 may include one or more local area networks (LANs), metropolitan area networks (MANs), wide area networks (WANs), all or a portion of a global network such as the Internet, or any other communication system or systems at one or more locations.
The application server 106 is coupled to the network 104 and is coupled to or otherwise communicates with the database server 108. In some embodiments, the application server 106 supports the three-layer machine learning pipeline architecture described below. For example, the application server 106 may execute one or more applications 112 that use data from the database 110 to perform operations associated with machine learning pipeline generation and management. Note that the database server 108 may also be used within the application server 106 to store information, in which case the application server 106 may store the information itself used to perform operations associated with machine learning pipeline generation and management.
The database server 108 operates to store and facilitate retrieval of various information used, generated, or collected by the application server 106 and the user devices 102a-102d in the database 110. For example, the database server 108 may store various information related to machine learning pipeline generation and management.
Although
As shown in
The memory 210 and a persistent storage 212 are examples of storage devices 204, which represent any structure(s) capable of storing and facilitating retrieval of information (such as data, program code, and/or other suitable information on a temporary or permanent basis). The memory 210 may represent a random access memory or any other suitable volatile or non-volatile storage device(s). The persistent storage 212 may contain one or more components or devices supporting longer-term storage of data, such as a read only memory, hard drive, Flash memory, or optical disc.
The communications unit 206 supports communications with other systems or devices. For example, the communications unit 206 can include a network interface card or a wireless transceiver facilitating communications over a wired or wireless network, such as the network 104. The communications unit 206 may support communications through any suitable physical or wireless communication link(s).
The I/O unit 208 allows for input and output of data. For example, the I/O unit 208 may provide a connection for user input through a keyboard, mouse, keypad, touchscreen, or other suitable input device. The I/O unit 208 may also send output to a display, printer, or other suitable output device. Note, however, that the I/O unit 208 may be omitted if the device 200 does not require local 110, such as when the device 200 represents a server or other device that can be accessed remotely.
Although
In this example, the three layers include an authoring layer 301, an optimization layer 302, and an execution layer 303. The user interacts only with the “top” layer of the architecture 300, namely the authoring layer 301. The user generally uses the authoring layer 301 to define an authoring representation 304 of a machine learning pipeline 308. In the architecture 300, the authoring representation 304 manages various operations 310 of the machine learning pipeline 308 without requiring machine learning operation-specific steps or different user-defined pipeline architectures corresponding to each operation 310. That is, the authoring representation 304 does not require a user to write, e.g., an explicit training step and then a prediction step when authoring the machine learning pipeline 308. For example, consider the user constructing a simple machine learning pipeline 308 in the authoring layer 301:
pipelineX=stepA->stepB->stepC.
Note that this machine learning pipeline 308 has no machine learning operation-specific steps like “train” or “predict” specified in the authoring layer 301. Instead, such steps are implicit in the construction of the machine learning pipeline 308. When an ML operation is executed on pipelineX, such as pipelineX.train( ) (which represents pipeline X+the operation “train”), the following steps are automatically executed without user definition: stepA.train( ) to train stepA's model, stepA.process( ) to produce inputs for step B, stepB.train( ) to train stepB's model, stepB.process( ) to produce inputs for stepC, stepC.train( ) to train stepC's model, and then packaging of all the trained steps.
When the user specifies an operation 310 to be performed on the authoring representation 304, the authoring representation 304 is translated to an intermediate representation 306 via a translation operation 305. The operation 310 to be performed, as specified by the user, can include training one or more machine learning operations, tuning one or more training parameters of the one or more machine learning operations, predicting new data, scoring the performance of a prediction result, interpreting a contribution level of different input data to prediction results, or the like. Herein, scoring the performance of the prediction result refers to measuring the quality of the predictions output by the model. For example, one scoring metric for machine learning models is accuracy, in which, given the predictions and the ground truth values, a real number (the score) is determined that indicates how well the predictions match the ground truth. Other example scoring metrics for machine learning models include precision, recall, and mean absolute error. The architecture 300 enables the user to author and train a machine learning model once (such as by using a large dataset), and the machine learning model can be executed many times using smaller inputs.
The intermediate representation 306 can be optimized by the system using the optimization layer 302, resulting in another intermediate representation 306 that can produce the same results. In the optimization laver 302, the system automatically optimizes the execution of the machine learning pipeline 308 based on various criteria, such as for cost or latency, depending on the inputs provided and the outputs requested. In some embodiments, various optimizations could be applied when transforming an authored machine learning pipeline 308 into an executable representation. For example,
Although
Returning to
The architecture 300 supports heterogeneous execution environments for different steps or operations in the machine learning pipeline 308. Among other things, this can allow for user flexibility of using open-source languages, frameworks, and libraries. That is, the architecture 300 is not limited to any specific system, development application, or production application. Also, each machine learning pipeline 308 can be exported or imported between different applications as needed or desired.
In addition, differentiation of machine learning pipelines 308 in the architecture 300 allows the user to remove one or more operations 310 of the machine learning pipeline 308 or replace one or more operations 310 with one or more different operations 310. This differentiation also allows different operations 310 to be performed using different hardware, such as to support different resource requirements. Most machine learning pipelines execute on one type of hardware. In contrast, the architecture 300 allows the user to divide the execution representation 307 of the machine learning pipeline 308 into multiple parts so that different operations 310 or different steps of one operation 310 can be executed on different (potentially more suitable or advantageous) hardware. For example, assume step A is a pre-processing routine, step B is a deep learning model, and step C is a post-processing routine. The architecture 300 allows step A and step C to execute on commodity hardware, while step B can leverage accelerated hardware, such as a GPU. In some embodiments, a user can provide hints, parameters, or instructions to the system to help the system determine on which hardware an operation should be executed.
To prevent unwanted or unsuitable changes to the machine learning pipeline 308, the architecture 300 allows freezing of part or all of the machine learning pipeline 308. As used here, “freezing” a machine learning pipeline refers to the system's ability to enforce immutability after a specific user action occurs. One example is freezing a machine learning pipeline after training the machine learning model. Since machine learning pipelines are composable, the freezing may affect only one or more parts of an entire machine learning pipeline without affecting one or more other parts of the machine learning pipeline.
In some embodiments, the architecture 300 enables the system to learn the minimum resources (such as memory, CPUs, GPUs, and the like) required to execute an operation 310 in the machine learning pipeline 308. Once the required resources are learned, the system can restrict certain operations 310 (such as predicting and interpreting) so that execution on resource-constrained devices is possible.
In the architecture 300, each operation 310 in the machine learning pipeline 308 may be implemented in a unique language with a language version, use unique frameworks and libraries, and run on a unique executor. Some examples of languages that can be used include JAVA, JAVASCRIPT, PYTHON, and the like. Some examples of frameworks and libraries that can be used include TENSORFLOW, KERAS, SCIKIT-LEARN, PYTORCH, SPACY, HUGGINGFACE, XGBOOST, and the like. Some examples of executors that can be used include SPARK, RAY, APACHE AIRFLOW. ARGO WORKFLOWS, and the like. Of course, other languages, frameworks, libraries, and executors are possible, and these examples do not limit the scope of the disclosure.
The architecture 300 facilitates pipeline composability by allowing the user to compose the machine learning pipeline 308 from existing, independently authored machine learning pipelines while the system maintains all claimed properties. Moreover, optimizations can be performed on the composition of multiple machine learning pipelines 308. In some embodiments, multiple machine learning pipelines 308 can be executed as a single execution graph even when the machine learning pipelines 308 have no knowledge of each other. In addition, machine learning pipelines 308 can be nested without difficulty. For example, a machine learning pipeline 308 developed by user A can be re-used in a larger machine learning pipeline 308 by user B. User A's machine learning pipeline 308 can be trained and made untrainable for future users (such as via freezing), which may allow the sharing of interesting functionalities without confidentiality or intellectual property breaches or performance degradation from user B. As a particular example, assume user A develops a computer vision pipeline, and user B wants to use the computer vision pipeline for a default detection application in manufacturing. The architecture 300 allows user B to use user A's pipeline and add pre-processing and post-processing steps for user B's specific application. Similarly, a machine learning pipeline 308 developed for a given task by user A might be shared without obfuscation. User B can decide to re-use user A's machine learning pipeline 308 but replace one or multiple steps with user B's own implementation. This would allow user B to use his or her own expertise in a specific area while leveraging the previous work of preprocessing, post-processing, and any parallel tasks built by user A.
Once the machine learning pipeline 308 is composed, the architecture 300 supports full pipeline persistence, including the ability of a user to name, save, and retrieve the machine learning pipeline 308. This in turn enables proper deployment of the machine learning pipeline 308. In addition, the architecture 300 allows for rich query filters to retrieve the machine learning pipeline 308 because the architecture 300 handles the machine learning models as reusable objects. Also, once the machine learning pipeline 308 is composed, the architecture 300 also allows for inspection of the machine learning pipeline 308. Pipeline inspection allows the user to trace execution paths back to the authoring level in order to understand how the machine learning pipeline 308 was originally authored. For instance, a user can inspect the performance of the machine learning pipeline 308, determine how long prediction or training took, and the like.
Although
The architecture 300 enables composition of multiple steps into a directed acyclic graph (DAG) machine learning pipeline and allows nesting of multiple machine learning pipelines inside higher level machine learning pipelines. A DAG machine learning pipeline can have one or multiple source nodes (multiple inputs) and one or multiple sink nodes (multiple outputs). This may be useful or important for many machine learning applications, such as when the user wants to combine the outputs of multiple models applied on multiple sources of data in order to compute one or a few final scores.
The architecture 300 supports heterogeneous execution environments for different steps in the DAG machine learning pipeline in order to allow a user the flexibility of using open-source languages, frameworks, and libraries. Each step in the machine learning pipeline may contain a unique language, language version, framework and libraries, and executor. For example,
The DAG machine learning pipeline 600 includes multiple steps 605-614 developed using multiple languages and frameworks, including PYTHON, KERAS, and TESSERACT. For example, part information extraction 605 can include extraction of part information from the component list 601 and the diagrams 603 (e.g., PDF diagrams), document extraction 606 can include identifying separate documents in the diagrams 603, diagram pre-processing 607 can include cleaning up and filtering the document data, symbol detection 608 can include detecting specific symbols in the diagrams 603, symbol identification 609 can include identifying the specific symbols in the diagrams 603, item number identification 610 can include identifying item numbers in the diagrams 603, knowledge consolidation 611 can include consolidating information obtained from the diagrams 603, assembly detection 612 can include detection of one or more assemblies in the extracted documents, OCR 613 can include optical character recognition of the extracted documents, and item number identification 614 can include identifying item numbers in the extracted documents. Of course, these steps, languages, and frameworks are merely examples, and other steps, languages and frameworks may be additionally or alternatively used.
Although
Although
As shown in
An indication of an operation to be performed on the authoring representation is received from the user at step 703. This could include, for example, the server 106 receiving an indication of an operation 310 from the user. The authoring representation is translated to an intermediate representation based on the operation at step 705. This could include, for example, the server 106 performing the translation operation 305 to translate the authoring representation 304 to the intermediate representation 306. The intermediate representation is optimized at step 707. This could include, for example, the server 106 optimizing the intermediate representation 306 in the optimization layer 302. The intermediate representation is translated to an execution representation that is understood by one or more machine learning executors at step 709. This could include, for example, the server 106 translating the intermediate representation 306 to the execution representation 307 in the execution layer 303. The execution representation is executed at step 711. This could include, for example, the server 106 executing an application or code represented by the execution representation 307.
Although
In some embodiments, a method includes translating an intermediate representation to an execution representation that is understood by one or more machine learning executors, wherein the intermediate representation is an operation based translation of an authoring representation, wherein the authoring representation is of a machine learning pipeline configuration to manage one or more machine learning operations.
In some embodiments, various functions described in this patent document are implemented or supported by a computer program that is formed from computer readable program code and that is embodied in a computer readable medium. The phrase “computer readable program code” includes any type of computer code, including source code, object code, and executable code. The phrase “computer readable medium” includes any type of medium capable of being accessed by a computer, such as read only memory (ROM), random access memory (RAM), a hard disk drive (HDD), a compact disc (CD), a digital video disc (DVD), or any other type of memory. A “non-transitory” computer readable medium excludes wired, wireless, optical, or other communication links that transport transitory electrical or other signals. A non-transitory computer readable medium includes media where data can be permanently stored and media where data can be stored and later overwritten, such as a rewritable optical disc or an erasable storage device.
It may be advantageous to set forth definitions of certain words and phrases used throughout this patent document. The terms “application” and “program” refer to one or more computer programs, software components, sets of instructions, procedures, functions, objects, classes, instances, related data, or a portion thereof adapted for implementation in a suitable computer code (including source code, object code, or executable code). The term “communicate,” as well as derivatives thereof, encompasses both direct and indirect communication. The terms “include” and “comprise,” as well as derivatives thereof, mean inclusion without limitation. The term “or” is inclusive, meaning and/or. The phrase “associated with,” as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, have a relationship to or with, or the like. The phrase “at least one of,” when used with a list of items, means that different combinations of one or more of the listed items may be used, and only one item in the list may be needed. For example, “at least one of: A, B, and C” includes any of the following combinations: A, B, C. A and B, A and C, B and C. and A and B and C.
The description in the present application should not be read as implying that any particular element, step, or function is an essential or critical element that must be included in the claim scope. The scope of patented subject matter is defined only by the allowed claims. Moreover, none of the claims invokes 35 U.S.C. § 112(f) with respect to any of the appended claims or claim elements unless the exact words “means for” or “step for” are explicitly used in the particular claim, followed by a participle phrase identifying a function. Use of terms such as (but not limited to) “mechanism,” “module.” “device,” “unit,” “component,” “element,” “member,” “apparatus,” “machine,” “system,” “processor,” or “controller” within a claim is understood and intended to refer to structures known to those skilled in the relevant art, as further modified or enhanced by the features of the claims themselves, and is not intended to invoke 35 U.S.C. § 112(f).
While this disclosure has described certain embodiments and generally associated methods, alterations and permutations of these embodiments and methods will be apparent to those skilled in the art. Accordingly, the above description of example embodiments does not define or constrain this disclosure. Other changes, substitutions, and alterations are also possible without departing from the spirit and scope of this disclosure, as defined by the following claims.
This application claims priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 63/269,605 filed on Mar. 18, 2022. This provisional application is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63269605 | Mar 2022 | US |