The present disclosure relates generally to controllers for controlling complex dynamical systems, and in particular, to a technique for generating an explicit rule-based control for a dynamical system.
For the control of complex systems, such as electric grids, building systems, plants, automotive vehicles, factory robots, etc., complex controls are necessary. While it is possible to synthesize sophisticated controls for controlling such complex systems, such as model predictive control (MPC), their implementation is often limited by hardware restrictions.
With an MPC controller, depending on the input variables, a future behavior of a system to be controlled is simulated in order to determine an output signal (control action) that optimizes the behavior of the system, often with defined constraints. MPC yields optimal solutions in a mathematical sense. However, such a control requires solving a mathematical program online to compute the control action. Linear MPC controllers are state of the art in industry. Considering highly non-linear problems with many constraints or even discrete variables/decisions, the realization of MPC can be quite challenging. MPCs require high computational efforts and mathematical challenges are associated with corresponding non-linear optimization technologies, e.g., non-guarantee of convergence.
A known approach to address the above-described challenge involves explicit model predictive control (explicit MPC). By solving the optimization problem off-line for a given range of operating conditions of interest and exploiting multiparametric programming techniques, explicit MPC computes the optimal control action off-line as an “explicit” function of the state and reference vectors, so that on-line operations reduce to a simple function evaluation. However, the ‘size’ of these explicit functions increases rapidly with the number of system states and constraints, so that they may become intractable to compute for large, complex systems.
Alternatively, or additionally, a data-driven machine learning model can be trained on the basis of simulations, which then supplements or replaces the online MPC computation. However, a control characteristic conveyed by a trained machine learning model, such as a neural network, is usually hardly analytically comprehensible or interpretable for a user, such as a technician, or an agency tasked to certify the controller, among others. Furthermore, for a neural network to accurately mimic a MPC solution, a large number of hidden nodes may be required, which can require greater computational resources.
Briefly, aspects of the present disclosure provide a technique for generating an explicit rule-based control for a dynamical system that addresses at least some of the above-described technical problems.
A first aspect of the disclosure provides a method for configuring a controller of a dynamical system. The method comprises reading a plurality of state signals, each state signal specifying a state of the dynamical system and being mapped to a multi-dimensional state space. The method further comprises using a first control algorithm to determine, for each state signal, a control signal that is assigned to that state signal. Each state signal and the assigned control signal represents a respective control point in a control data manifold pertaining to the dynamical system. The method further comprises detecting patches on the control data manifold by identifying control points on the control data manifold that belong to a common local approximation function. The method further comprises training a classifier to classify control points into different patches from among the detected patches. The method further comprises training a respective regression model for each detected patch for approximating a relationship between the state signals and the control signals in that patch. The trained classifier and regression models are used to create an explicit rule-based control algorithm, which is configured to convert a measured state signal obtained from the dynamical system into a control action by identifying an active patch as a function of the measured state signal and evaluating the respective regression model for the identified active patch.
A second aspect of the disclosure provides a method for controlling a dynamical system. The method comprises creating an explicit rule-based control algorithm according to a method described above. The method further comprises using the explicit rule-based control algorithm to control the dynamical system by: receiving a measured state signal from the dynamical system, identifying an active patch as a function of the measured state signal, and executing a control action as a function of the measured state signal by evaluating the respective regression model for the identified active patch.
Other aspects of the present disclosure implement features of the above-described methods in controllers and computer program products.
Additional technical features and benefits may be realized through the techniques of the present disclosure. Embodiments and aspects of the disclosure are described in detail herein and are considered a part of the claimed subject matter. For a better understanding, refer to the detailed description and to the drawings.
The foregoing and other aspects of the present disclosure are best understood from the following detailed description when read in connection with the accompanying drawings. To easily identify the discussion of any element or act, the most significant digit or digits in a reference number refer to the figure number in which the element or act is first introduced.
Various technologies that pertain to systems and methods will now be described with reference to the drawings, where like reference numerals represent like elements throughout. The drawings discussed below, and the various embodiments used to describe the principles of the present disclosure in this patent document are by way of illustration only and should not be construed in any way to limit the scope of the disclosure. Those skilled in the art will understand that the principles of the present disclosure may be implemented in any suitably arranged apparatus. It is to be understood that functionality that is described as being carried out by certain system elements may be performed by multiple elements. Similarly, for instance, an element may be configured to perform functionality that is described as being carried out by multiple elements. The numerous innovative teachings of the present application will be described with reference to exemplary non-limiting embodiments.
It is recognized that many modern control algorithms may be synthesized to work well in simulation environments but are challenging to implement on controllers for online control of a complex dynamical system, due to restrictions and limitations in controller hardware. The disclosed methodology may be used to extract rules from advanced control algorithms which can be implemented in existing controllers, such as programmable logic controllers (PLC) or another type of computing device. Furthermore, the resulting control algorithm is transparent and readily interpretable due to its explicit character, allowing personnel, such as a technician or an official authority, to better understand relationships between controller inputs and outputs.
As mentioned above, in case of traditional explicit MPC, the ‘size’ of the explicit functions increases rapidly with the number of system states and constraints, so that they may become intractable to compute for large, complex systems. Furthermore, often it may not be feasible to reduce the input state space into lower dimensions. The disclosed methodology provides a solution that can be scalable to high dimensional system states by implementing a technique for identifying patches (such as hyperplanes) in the control data manifold and grouping the control data points according to the patches. Control rules are then extracted by training a regression model for each patch, to create an explicit rule-based control algorithm. When deployed on a controller hardware for online control, the explicit rule-based control algorithm may use a simple indicator function to identify an active patch as a function of a measured input state signal and evaluate a corresponding regressor to determine a control action.
The disclosed methodology may thus provide an efficient mechanism by which hardware controllers can apply control rules and determine control actions for high-dimensional system states, without the stringent computational requirements of MPC, neural networks, or other advanced control algorithms. Through indicator functions that encapsulate patches in control data manifolds and regression models, the technology described herein may provide a computationally elegant methodology to evaluate control actions for complex system states, all the while maintaining requisite accuracy and effectiveness of generated control actions.
The disclosed methodology does not depend on the type of control algorithm for which rules are to be extracted or the type of controller hardware where the resulting explicit rule-based control algorithm is to be executed. The embodiments described herein suitably address the problem of generating explicit control rules based on model predictive control (MPC), which contains sophisticated optimization problems that can place a substantial computational burden particularly on old or under-powered field controllers. It will however be appreciated that the underlying technique is not limited to MPC or any particular type of control algorithm.
Referring now to the drawings,
The platform 100 includes a simulator 102 to simulate the dynamical system or one or more of its components. The simulator 102 serves the purpose of simulatively generating a large number of state signals X and a control signal U for each state signal X using a simulation model 104 of the dynamical system that interacts with the control algorithm 106. Each state signal X may specify a discrete state of the dynamical system. In the shown embodiment, the state signals are read from the simulation model 104. For each state signal X, a control signal U is generated by the control algorithm 106 and assigned to the respective state signal X. The control signal U may be determined as a function of the state signal X such that, when applied, it optimizes behavior of the dynamical system, as simulated by the simulation model 104. In some embodiments, the simulation model 104 may comprise a high-fidelity physics-based model, which may be part of a digital twin of the dynamical system.
The state signals X may include, for instance, physical, chemical, design-related operating parameters, property data, performance data, environmental data, monitoring data, forecast data, analysis data and/or other data arising in the operation of the dynamical system and/or describing an operating state of the dynamical system. For example, if the dynamical system comprises a vehicle, the state signals X may include positioning data, speed, temperature, pressure, rotational speed, emissions, vibrations, fuel consumption, etc. The control signal U may be determined on the basis of solving an optimization problem by the control algorithm 106, to optimize the behavior of the dynamical system. The optimization problem may include minimizing a cost function, for example, associated with energy/power consumption, wear, distance, time, price, etc.
The state signals X can be represented as numerical data vectors mapped to a multi-dimensional state space. The control signals U may also be represented as numerical data vectors mapped to a multi-dimensional control action space, or may be represented as scalar values (one-dimensional control action space). In one embodiment, the state signals X and the control signals U represent time series data, where, for each time step, a respective control signal U is generated by the control algorithm 106 based on an updated state signal X for that time step. The state signal X is then updated for the next time step as induced by the action resulting from the control signal U. The time series data pertaining to state signals X and the control signals U are preferably generated for a variety of initial states and operating scenarios of the dynamical system. The initial states and operating scenarios may be obtained from a database 108.
For this purpose, the simulation model 104 may simulate a behavior of the dynamical system for a variety of operating scenarios. The latter may include a variety of operating conditions and/or operating states that may occur during the operation of the dynamical system. Such operating scenarios may be extracted from operating data of the dynamical system and/or from the database 108. In one embodiment, a variety of operating scenarios and initial states of the dynamical system may be generated by a scenario generator of the simulator 102 (not shown). The scenario generator may generate, in the operation of the dynamical system, possibly occurring state signals, trajectories, time series, external influences, operating events and/or constraints to be satisfied. To vary the generated operating scenarios, the scenario generation can also be random. The generation of the operating scenarios may be based on basic data or model data for the dynamical system, which may be stored in the database 108 and fed into the scenario generator for the purpose of generation of operating scenarios.
In the described embodiment, the control algorithm 106 is an MPC algorithm. The MPC algorithm 106 typically generates, for each state signal X, a plurality of variants of a control signal. The behavior of the dynamical system is simulated for each of the variants of the control signal over a defined number of time steps, referred to as prediction horizon. Based on the simulated behavior, one of the variants can be selected and assigned as the control signal U for the given state signal X that leads to an optimized behavior of the dynamical system, possibly with specified constraint(s). For example, the variant providing an optimized behavior of the dynamical system may be determined as one that results in the lowest value of a cost function subject to the constraint(s), from among the plurality of variants. The assigned control signal U is applied for a single time step, after which the above-described optimization is solved again over a receding prediction horizon, with an updated state signal X which is induced by the assigned control signal U determined at the previous time step.
As an alternate example, the control algorithm 106 may comprise a neural network based policy. The policy may be trained to map state signals to assigned control signals, for example, based on reinforcement learning (RL) or any other method.
In various embodiments, as an alternative to the simulative signal generation illustrated in
The state signals X and the assigned control signals U are read into a storage medium containing control data 110. The control data 110 comprises a large number of control points (X, U), each defined by a state signal X and a control signal U assigned to that state signal. The control points are represented in a control data space having a high dimensionality, depending on the dimensionality of the state space (input space) and the control action space (output space). In practice, the control points lie on a lower-dimension manifold embedded in a higher-dimensional control data space. Such a manifold is referred to here as control data manifold. For a scalar output, the control data manifold may have the dimensionality of the input space.
For the purpose of a simple visual illustration,
Referring again to
In the patch detector 112, patches are detected in an unsupervised manner by identifying control points on the control data manifold that belong to a common local approximation function. According to an exemplary algorithm, the patch detector 112 samples control points Q in the control data manifold and determines a set Z of k nearest control points in the neighborhood of each sampled control point Q. Next, the patch detector 112 determines an equation E of a local approximation function describing a region (patch) in the control data manifold comprising the control point Q and the set Z of k neighborhood control points (i.e., an equation fitting Q+Z). If the equation E is the same as or similar to an equation of a stored local approximation function of another patch, then the local approximation function associated with Q+Z is assigned an existing patch label; if not, then the local approximation function associated with Q+Z is assigned a new patch label. In an example embodiment, “similarity” of the equation E with any of the stored local approximation function may be determined, by comparing a fitting error of equation E to a fitting error of the respective stored local approximation function. If there exists a stored local approximation function for which the fitting error is close to the fitting error of the equation E (e.g., within a defined threshold), then the patch label of that stored local approximation function as assigned to the equation E; else a new patch label is created. After sampling the control points in the control data manifold, a set of n patch labels are detected, and each patch may be represented by a unique local approximation function.
In the case of a linear system, such as that in the example shown in
In the shown visualization, since the control data manifold 200 is two-dimensional, the linear patches H1, H2, and H3 also belong to respective two-dimensional planes. To generalize, for any higher order linear system, the control data manifold is of P dimensions (where P≥2) and may be formed by a plurality of P-dimensional hyperplanes embedded in a higher-dimensional (>P) control data space. The example shown in
The patch detector algorithm is thereby able identify regions on the control surface without prior knowledge of the controller behavior. Thus, in addition to approximating the control rules, the above-described patch detector algorithm can be extended to identify different functional elements comprising the controller-such as saturation elements, LQR controller surfaces, PID controller surfaces-without any prior knowledge of the controller.
For some controllers, the control points may define one or more regions containing sharp edges, which may potentially lead to an error of mis-identifying control points along these edges into an appropriate hyperplane. In this case, it is still reasonable to assume that the regions containing sharp edges are smaller in measure than the patches/hyperplanes. A first solution to the above technical problem involves first classifying the control points into hyperplanes (using a control point classifier 114 as described below) and subsequently mapping mis-identified sharp edges into one of the hyperplanes. A second possible solution may involve identifying abrupt changes in local fitting and refining neighborhood points. Sharp edges in control data are particularly typical in case of reinforcement learning (RL)-based controllers. For such controllers, the RL training may be modified to include smaller steps or to move away from sharp edges.
In some linear controls, the hyperplanes may be distinguished by a different set of constraints being active. In such examples, if there is an a-priori knowledge about constraints, the hyperplanes may be assigned classification labels by evaluating a constraint function. However, the disclosed methodology does not necessarily rely on a-priori knowledge of constraints.
For complex non-linear systems, the control data manifold may comprise one or more non-linear regions. In such a case, the patch detector 112 may be configured according to one of the below-mentioned approaches. In a first approach, the patch detector 112 may fit a plurality of hyperplanes to approximate a non-linear region on the manifold. In one embodiment, the patch detector 112 may work with a specified maximum number of hyperplanes that the control rules are be approximated with. In a second approach, the patch detector 112 may fit a non-linear region with a single non-linear patch or multiple non-linear patches, each defined by a quadratic or a higher order polynomial local approximation function. Such a technique could be employed to reduce or optimize the number of patches used to approximate the non-linear region of the control data manifold and/or to reduce an error between the control points on the manifold and the local approximating functions. The patch detector 112 need not rely on a-priori knowledge about linearity of the controller and may appropriately determine a local approximation function based on determining a rate of change of each hyperplane and correlating the rate of change within a region to fit either a linear or a polynomial local approximation function.
Still referring to
For multiclass classification using SVM, a binary classification principle can be utilized after breaking down the multiclassification problem into multiple binary classification problems. In the disclosed embodiment, a number of SVMs are trained that equals the number (n) of patches detected in the control data manifold. Here, each SVM is trained to perform a binary classification between a given patch and the other patches (one-versus-all). After training, each patch is assigned an SVM indicator function, which may be evaluated to determine whether a control point belongs to that patch or not. For a hyperplane classification problem, the SVM indicator functions assume the form of linear equations (see
Continuing with reference to
The trained control point classifier (in this case, defined by SVM indicator functions F1, F2, . . . Fn) and the regression models R1, R2 . . . Rn are used to create the explicit rule-based control algorithm 118. The explicit rule-based control algorithm 118 may be embodied in a computer program, which can be be transferred to a memory of a controller for controlling the dynamical system. In embodiments, the transfer may take place electronically, from a remote location, such as via the Internet, or by way of a physical memory, such as a flash drive. Alternately, the set up of the explicit rule-based control algorithm 118 can be implemented directly on the hardware of the controller for controlling the dynamical system.
The dynamical system 402 includes at least one sensor 406, which continuously measures one or more operating states of the dynamical system 402 and outputs them in the form of a measured state signal Xa. The measured state signals Xa can each be represented as a numerical data vector mapped to a multi-dimensional state space. In embodiments, the measured state signals Xa are coded over time, representing time series data. The measured state signals Xa are transmitted to the controller 404. Based on the measured state signal Xa at each time step, the controller 404 determines a control action Ua to optimize a behavior of the dynamical system 402 by executing the explicit rule-based control algorithm 118. The control action Ua is determined by identifying an active patch as a function of the measured state signal Xa and evaluating the respective regression model for the identified active patch.
Continuing with the described embodiment of a linear system,
Referring to
The described approach of evaluating SVM indicator functions to identify an active patch and evaluating a regressor associated with the active patch to determine a control action may be likewise implemented for non-linear systems by fitting non-linear regions by multiple hyperplanes or using polynomial local approximation functions and regressors as described above.
Referring again to
An illustrative use case is now described where the disclosed embodiments may be utilized for providing an optimal economic dispatch of electricity from a grid. Although the illustrative use case is simple, the underlying principle can be applied to complex systems with high-dimensional system states and a large number of constraints.
The dynamical system in the described use case includes a battery that is chargeable by a photovoltaic (PV) panel and dischargeable to provide power to a building, which also receives power from the grid. A controller is tasked with controlling the charging or discharging of the battery to minimize the price of transacting power from the grid. The temporally varying operational states of the dynamical system comprise: (i) building power consumption or load Li, which is depicted in
The optimization problem in this use case may be formulated as minimization of a linear cost function given by:
An MPC algorithm can be used to solve the above optimization (eq. 1) at every time step and apply the first control step before re-solving with a receding horizon. The control action generated by the MPC for m=2 is represented in
For a prediction horizon m=2 it is recognized that the control action U at each time step may be solely determined based on the following state parameters: X1—change in electricity price at that time step; and X2—battery state of charge. With this, the present use case reduces to an MPC problem for a second order linear system with constrained input, similar to the representative example shown in
The above-described use case is merely illustrative, and several other applications of the disclosed methodology exist. As non-limiting examples, the disclosed methodology may be used in a building controller (e.g., to adapt room heating/cooling to weather and use), in an automotive controller (e.g., to provide energy optimal predictive power control for given speed corridor), in a process controller (e.g., to control highly dynamic reactions), in a factory robot controller (e.g., to provide energy and wear optimized robot paths), among other applications.
The embodiments of the present disclosure may be implemented with any combination of hardware and software. In addition, the embodiments of the present disclosure may be included in an article of manufacture (e.g., one or more computer program products) having, for example, a non-transitory computer-readable storage medium. The computer readable storage medium has embodied therein, for instance, computer readable program instructions for providing and facilitating the mechanisms of the embodiments of the present disclosure. The article of manufacture can be included as part of a computer system or sold separately.
The computer readable storage medium can include a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
The system and processes of the figures are not exclusive. Other systems, processes and menus may be derived in accordance with the principles of the disclosure to accomplish the same objectives. Although this disclosure has been described with reference to particular embodiments, it is to be understood that the embodiments and variations shown and described herein are for illustration purposes only. Modifications to the current design may be implemented by those skilled in the art, without departing from the scope of the disclosure.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2021/046463 | 8/18/2021 | WO |