This invention relates to autonomous control of power grid voltage profiles.
With the fast-growing penetration of renewable energies, distributed energy resources, demand response and new electricity market behavior, conventional power grid with decades-old infrastructure is facing grand challenges such as fast and deep ramps and increasing uncertainties (e.g., the Californian duck curves), threatening the secure and economic operation of power systems. In addition, traditional power grids are designed and operated to withstand N-1 (and some N-2) contingencies, required by NERC standards. Under extreme conditions, local disturbances, if not controlled properly, may spread to neighborhood areas and cause cascading failures, eventually leading to wide-area blackouts. It is therefore of critical importance to promptly detect abnormal operating conditions and events, understand the growing risks and more importantly, apply timely and effective control actions to bring the system back to normal after large disturbances.
Automatic controllers including excitation system, governors, power system stabilizer (PSS), automatic generation control (AGC), etc., are designed and equipped for generator units to maintain voltage and frequency profiles once a disturbance is detected. Traditionally, voltage control is performed at device level with predetermined settings, e.g., at generator terminals or buses with shunts or SVCs. The impact of such a control scheme is limited to the points of connection and their neighboring buses only, if without proper coordination. Massive offline studies are then needed to predict future representative operating conditions and then coordinate various voltage controllers before determining operational rules for use in real time. Manual actions from system operators are still needed on a daily routine to mitigate operational risks that cannot be handled by the existing automatic controls because of the complexity and high dimensionality of modern power grid. These actions include generator re-dispatch deviating from their scheduled operating points, switching capacitors and reactors, shedding loads under emergency conditions, reducing critical path flows, tripping generators, adjusting voltage setpoints of generator terminal buses, and so on. The time of application, duration and size of these manual actions are typically determined offline by running massive simulations considering the projected “worst” operating scenarios and contingencies, in form of decision tables and operational orders. It is very difficult to precisely estimate future operating conditions and to determine optimal controls, leading to the fact that the offline determined control strategies are either too conservative (causing over investment) or risky (causing stability concerns) when applied in real world.
Deriving effective and rapid voltage control commands for real-time conditions becomes critical to mitigate potential voltage issues for a power grid with ever-increasing dynamics and stochastics. Several measures have been deployed by power utilities and independent system operators (ISOs). Performing security assessment in near real time is one example, which can effectively understand the operational risks if a contingency occurs. However, the lack of computing power and sufficiently accurate grid models prevents optimal control actions from being derived and deployed in real time. Machine learning based methods, e.g., decision trees, support vector machines, neural networks, were developed in the past to first train agents using offline analysis and then apply in real time. These approaches focus on monitoring and security assessment, rather than performing and evaluating controls for operation.
To provide coordinated voltage control actions, hierarchical automatic voltage control (AVC) systems with multiple-level coordination were deployed in the field, e.g., in France, Italy and China, which typically consists of 3 levels (primary, secondary and tertiary).
(a) At primary level, automatic voltage regulator is used to maintain local voltage profile, through excitation systems with a response time of several seconds.
(b) At secondary level, control zones, either determined statically or adaptively (e.g., using sensitivity-based approach), need to be formed first where a few pilot buses are identified; the control objective is to coordinate all reactive power resources in each zone for regulating voltage profiles of the selected pilot buses only, with a response time of several minutes.
(c) At tertiary level, the objective is to minimize power losses by adjusting setpoints of those zonal pilot buses while respecting security constraints, with a response time of 15 minutes to several hours.
The core technologies behind these techniques are based on optimization methods using near real-time system models, e.g., AC optimal power flow considering various constraints, which work well majority of the time in the real-time environment; however, certain limitations still exist that may affect the voltage control performance, including:
(1) They require relatively accurate real-time system models to achieve the desired control performance, which highly depend upon real-time EMS snapshots running every few minutes. The control measures derived for the captured snapshots may not function well if significant disturbances or topology changes occur in the system between two adjacent EMS snapshots.
(2) For a large-scale power network, coordinating and optimizing all controllers in a high dimensional space is very challenging, which may require a long solution time or in rare cases, fail to reach a solution. Suboptimal solutions can be used for practical implementation. For diverged cases, the control measures of the previous day or historically similar cases are used.
(3) Sensitivity-based methods for forming controllable zones are subject to high complexity and nonlinearity in a power system in that the zone definition may change significantly with different operating conditions with various topologies and under contingencies.
(4) Optimal power flow (OPF) based approaches are typically designed for single system snapshots only, making it difficult to coordinate control actions across multiple time steps while considering practical constraints, i.e., capacitors should not be switched on and off too often during one operating day.
In one aspect, systems and methods are disclosed to control voltage profiles of a power grid by forming an autonomous voltage control model with one or more neural networks as Deep Reinforcement Learning (DRL) agents; training the DRL agents to provide data-driven, real-time and autonomous grid control strategies; and coordinating and optimizing reactive power controllers to regulate voltage profiles in the power grid with a Markov decision process (MDP) operating with reinforcement learning to control problems in dynamic and stochastic environments.
In another aspect, systems and methods are disclosed to control voltage profiles of a power grid that includes measuring states of a power grid; determining abnormal voltage conditions and locating affected areas in the power grid; creating representative operating conditions including contingencies for the power grid; conducting power grid simulations in an offline or online environment; training deep-reinforcement-learning-based agents for autonomously controlling power grid voltage profiles; and coordinating and optimizing control actions of reactive power controllers in the power grid.
In a further aspect, systems and methods are disclosed to control voltage profiles of a power grid includes measuring states of a power grid from phasor measurement units or EMS system, determining abnormal voltage conditions and locating the affected areas in a power network, creating massive representative operating conditions considering various contingencies, simulating a large number of scenarios, training effective deep-reinforcement-learning-based agents for autonomously controlling power grid voltage profiles, improving control performance of the trained agents, coordinating and optimizing control actions of all available reactive power resources, and generating effective data-driven, autonomous control commands for correcting voltage issues considering N-1 contingencies in a power grid.
In yet another aspect, a generalized framework for providing data-driven, autonomous control commands for regulating voltages, frequencies, line flows, economics in a power network under normal and contingency operating conditions. The embodiment is used to create representative operating conditions of a power grid by interacting with various power flow solvers, simulate contingency conditions, and train different types of DRL-based agents for various objectives in providing autonomous control commands for real-time operation of a power grid.
Advantages of the system may include one or more of the following. The system can significantly improve control effectiveness in regulating voltage profiles in a power grid under normal and contingency conditions. To enhance the stability of a single DQN agent, two architecture-identical deep neural networks are used, including one target network and one evaluation network. The system is purely data driven, without the need for accurate real-time system models when making coordinated voltage control decisions, once an AI agent is properly trained. Thus, live PMU data stream from WAMS can be used to enable sub-second controls, which is extremely valuable for scenarios with fast changes like renewable variations and system disturbances. During the training process, the agent is capable of self-learning by exploring more control options in a high dimension by jumping out of local optima and therefore improves its overall performance. The formulation of DRL for voltage control is flexible in that it can intake multiple control objectives and consider various security constraints, especially time-series constraints.
An autonomous voltage control schema for grid operation using deep reinforcement learning (DRL) is detailed next. In one embodiment, an innovative and promising approach of training DRL agents with improved RL algorithms provides data-driven, real-time and autonomous control strategies by coordinating and optimizing available controllers to regulate voltage profiles in a power grid, where the AVC problem is formulated as Markov decision process (MDP) so that it can take full advantages of state-of-the-art reinforcement learning (RL) algorithms that are proven to be effective in various real-world control problems in highly dynamic and stochastic environments.
One embodiment uses an autonomous control framework, named “Grid Mind”, for power grid operation that takes advantage of state-of-the-art artificial intelligent (AI) technology, namely deep reinforcement learning (DRL), and synchronized measurements (phasor measurement units) to derive fast and effective controls in real time targeting at the current and near-future operating conditions considering N-1 contingencies.
The architecture design of the embodiment is provided in
A coordinated voltage control problem formulated as Markov decision process (MDP) is detailed next. An MDP represents a discrete time stochastic control process, which provides a general framework for modeling decision making procedure for a stochastic and dynamic control problem. For the problem of coordinated voltage control, a 4-tuple can be used to formulate the MDP:
where S is a vector of system states, including voltage magnitudes and phase angles across the system or areas of interest; A is a list of actions to be taken, e.g., generator terminal bus voltage setpoints, status of shunts and tap ratios of transformers; Pa(s, s′)=Pr(st+1=s′|st=s, at=a) represents the transition probability from the current state st to a new state, st+1, after taking an action a at time=t; Ra(s, s′) is the reward received after reaching state, s′, from the previous state, s, to quantify the overall control performance.
Solving the MDP is to find an optimal “policy”, π(s), which can specify actions based on states so that the expected accumulated rewards, typically modelled as a Q-value function, Qπ(s, a), can be maximized in the long run, given by:
Q
π(s, a) =(rt+1+γrt+2+γrt+3+ . . . |s, a) (1)
Then, an optimal value function is the maximum achievable value given as:
Once Q* is known, the agent can act optimally as:
Accordingly, the optimal value that maximizes over all decisions can be expressed as:
Essentially, the process in Equations (1)-(4) is a Markov Chain process. Since the future rewards are now easily predictable by neural networks, the optimal value can be decomposed into a more condensed way as a Bellman equation:
where γ is discounted factor. This problem can then be solved using many state-of-the-art reinforcement learning algorithms.
Artificial Intelligence is a process when computers try to solve specific tasks or problems by mimicking human's behavior; and machine learning (ML) is a subset of AI technologies by learning from data or observations and then making decisions based on trained models. ML consists of supervised learning, unsupervised learning, and reinforcement learning (RL), serving different purposes. Different from all other branches, RL refers to an agent that learns its action policy that maximizes the expected rewards based on interactions with the environment. Typical RL algorithms include dynamic programming, Monte Carlo and Temporal difference such as Q-learning. An RL agent continuously interacts with an environment; where the environment receives an action, emits new states and calculates a reward; and the agent observes states, suggests action to maximize next reward. Training an RL agent involves dynamically updating a policy (mapping from states to action), a value function (mapping from action to reward) and a model (for representing the environment).
Deep learning (DL) provides a general framework for representation learning that consists of many layers of nonlinear functions mapping inputs to outputs. Its uniqueness rests with the fact that DL does not need to specify features beforehand. One typical example is the deep neural network. Basically, DRL is a combination of DL and RL, where DL is used for representation learning and RL for decision making. In the embodiment, deep Q network (DQN), is used to estimate the value function, which supports continuous state sets and is suitable for power grid control. The designed DRL agent in the framework for providing autonomous coordinated voltage control is shown in
The goal of a well-trained DRL agent for autonomous voltage control is to provide an effective action from finite control action sets when observing abnormal voltage profiles. The definition of episode, states, action and reward is given below:
(1) Episode: An episode represents any operating condition collected from real-time measurement systems such as supervisory control and data acquisition (SCADA) or phasor measurement unit (PMU), under random load variations, generation dispatches, topology changes and contingencies. Contingencies are randomly selected and applied in this embodiment to mimic reality.
(2) States: The states are defined as a vector of system information that is used to represent system conditions, including active and reactive power flows on transmission lines and transformers, as well as bus voltage magnitudes and phase angles.
(3) Action Space: Typical manual control actions to mitigate voltage issues include adjusting generator terminal voltage setpoints, switching shunt elements, transformer tap ratios, etc. In this work, without loss of generality, the inventors consider generator voltage set point adjustments as actions to maintain system voltage profile. Each can be adjusted within a range, e.g., [0.95, 0.975, 1.0, 1.025, 1.05] p.u. The combination or permutation of all available generator setpoints forms an action space used to train a DRL agent.
(4) Reward: Several voltage operation zones are defined to differentiate voltage profiles, including normal zone (0.95-1.05 pu), violation zone (0.8-0.95 pu or 1.05-1.25 pu) and diverged zone (>1.25 pu or <0.8 pu), as shown in
Rewards are designed accordingly for each zone. In one episode (Ep), define Vi as the voltage magnitude at bus i, and the reward for the jth control iteration can be calculated as:
The final reward for an entire episode containing n iterations is then calculated as the total accumulated rewards divided by the number of control iterations:
Final Reward=Σj=1nRewardj/n (7)
In this way, a higher reward is assigned to very effective action (taking one control iteration only vs many action iterations) to solve the same voltage problem. With the above definition of DRL components, the computational flowchart of training a DRL agent is given in
Step 1: starting from one episode (real-time information collected in a power network), solve power flow and check potential voltage violations. A typical violation range can be defined as 0.95-1.05 p.u. for all buses of interest in the power system being studied;
Step 2: based on the states obtained, a reward value can be calculated, both of which are fed into the DRL agent; the agent then generates an action based on its observation of the current states and expected future rewards;
Step 3: the environment (e.g., AC power flow solver) takes the suggested action and solve another power flow. Then, bus voltage violations are checked again. If no more violation occurs, calculate the final reward for this episode and terminate the process of the current episode;
Step 4: if violation is detected, check for divergence. If divergence occurs, update the final reward and terminate an episode. If power flow converges, evaluate reward and return to Step 2.
The training process terminates when one of the three conditions is met: (1) no more violation occurs, (2) power flow diverges, or (3) the maximum number of iterations is reached.
Implementation details of training DRL agents are detailed next. There are mainly three reinforcement learning methods: model-based (e.g., dynamic programming method), policy-based (e.g., Monte Carlo method) and value-based (e.g., Q-learning and SARSA method). The latter two are model-free methods, indicating they can interact with the environment directly without the need for environment model, and can handle problems with stochastic transitions and rewards. One embodiment uses an enhanced Deep-Q network (DQN) algorithm and a high-level overview of the training procedure and implementation of the DQN agents is shown in
Q
(s,a)
=Q
(s,a)
+α[r+γmaxQ(s′,a′)−Q(s,a)] (8)
where α is the learning rate and y is the discount rate. The parameters of NN is updated by minimizing the error between the actual and estimated Q-values [r+γmaxQ(s′,a′)−Q(s,a)]. In this work, there are two specific designs making DQN a promising candidate for coordinated voltage control, namely experience replay and fixed Q-targets. Firstly, DQN has an internal memory to restore the past-experience and learn from it repeatedly. Secondly, to mitigate the overfitting problem, two NNs are used in the enhanced DQN method, with one being a target network and the other an evaluation network. Both networks share the same structure, but with different parameters. The evaluation network keeps updating its parameters with training data. The parameters of the target network are fixed and periodically get updated from the evaluation network. In this way, the training process of DQN becomes more stable. The pseudo code for training and testing the DQN agent is presented in Table I. The corresponding flowchart is given in
During the exploration period, the decaying E-greedy method is applied, which means the DQN agent has a decaying probability of ϵi to make a random action selection at the ith iteration. And ϵi can be updated as
where rd is a constant decay rate.
The platform used to train and test DRL agents for autonomous voltage control is selected to be CentOS 7 Linux Operation System (64 bit). This server is equipped with Intel Xeon E7-8893 v3 CPU at 3.2 GHz and 528 GB memory. All the DRL training and testing process are performed on this platform.
To mimic real power system environment, a commercial power grid simulator is adopted, which is equipped with function modules such as power flow, dynamic simulation, contingency analysis, state estimation and so on. In this embodiment, only the AC power flow module, as environment, is applied to interact with the DRL agent. Intermediate files are used to pass information between the power flow solver and the DRL Agent, including power flow information file saved in PTI raw format and power flow solution results saved in text files.
For DRL agent, the most recently developed DQN libraries in Anaconda is utilized, which is a popular python data science platform for implementing AI technologies. This platform provides useful libraries including Keras, Tensorflow, Numpy and others for effective DQN agent development. The Deep Q-learning framework is also used to set up the environment of DRL Agent and to interact with the environment, which is coded using Python 3.6.5 scripts. The information flow is given in
Next, experimental validations of the instant system are discussed. One embodiment for autonomous voltage control is tested on the IEEE 14-bus system model and the Illinois 200-bus systems with tens of thousands realistic operating conditions, which demonstrate outstanding performance in providing coordinated voltage control for unknown system operating conditions. Extensive sensitivity studies are also conducted to thoroughly analyze the impacts of different parameters on DRL agents towards more robust and efficient decision making. This method not only effectively supports grid operators in making real-time voltage control decisions (for a grid without AVC); but also provides complimentary feature to the existing OPF-based AVC system at secondary and tertiary levels.
To generate massive representative operating conditions for training DRL agents, random load perturbations to different extent are applied to load buses across the entire system to mimic renewable generation variation and different load patterns. After load changes, generators are re-dispatched using a participation factor list determined by installed capacity or operation reserves to maintain system power balance. The commercial software package, Powerflow & Short circuit Assessment Tool (PSAT) developed by Powertech Labs in Canada, is used to generate massive random cases using python scripts for these two systems. Each case presents a converged power flow condition with or without voltage violations, saved in PTI format files. Over 83% of the created cases have voltage violation issues with respect to a safe zone of [0.95, 1.05] pu. More voltage issues in the created scenarios are preferred when training and optimizing DRL policies, as safe scenarios do not need to trigger corrective controls.
The IEEE 14-bus power system model consists of 14 buses, 5 generators, 11 loads, 17 lines and 3 transformers. The total system load is 259 MW and 73.5 MVAr. A single-line diagram of the system is shown in
In Case I, all lines and transformers are in service without any topology changes. Random load changes are applied across the entire system, and each load fluctuates within 80%-120% of its original value. When loads change, generators are re-dispatched based on a participation factor list to maintain system power balance. 10,000 random operating conditions are created accordingly. A DRL agent is trained using the embodiment and its performance on the 10,000 episodes is shown in
Table II explains the details of the agent's intelligence in Episode 8 and 5000. For the initial system condition in Episode 8, several bus voltage violations are identified, shown in the first row of Table II. To fix the voltage issues, the agent took an action by setting generator voltage setpoint to [1.05 1.025 1 0.95 0.975] for the 5 generators; after this action, the system observes less violations, shown in the second row of Table II. Then, the agent took a second action [1.025 0.975 0.95 1 1.05] before all the voltage issues are fixed. By the time the agent learns 4999 episodes, it accumulates sufficient knowledge: at the initial condition of Episode 5000, 6 bus voltage violations are observed, highlighted in the 4th row of Table II. The agent took one action and corrected all voltage issues, using the policy that DQN memorizes.
In Case II, the same number of episodes are used, but random N-1 contingencies are considered to represent emergency conditions in real grid operation. Several line outages are considered, including lines 1-5, 2-3, 4-5, and 7-9. Each episode picks one outage randomly, before feeding into the learning process. Shown in
In Case III, the definition of final reward for any episode is revised so that a higher reward, in the value of 200, is issued when the agent can fix the voltage profile using only one control iteration; if there is any voltage violation in the states, no reward is given. Using the updated reward definition and the procedures in Case II to train an agent considering N-1 contingencies. Once the agent is trained, it is tested on a new set of 10,000 episodes randomly generated with contingencies, by reducing exploration rate to a very small value. The test performance is shown in
In this case study, the combination of 4 generator voltage setpoints (except the swing generator) is used to form an action space of 54=625, where each generator can choose one out of five discrete values from a pre-determined list, [0.95, 1.05]. With the above procedures, a wide range of load fluctuations between 60% and 140% of their original values is applied and a total number of 50,000 power flow cases are successfully created. One DQN agent with both evaluation network and target network is trained and properly tuned, using the normalization and dropout techniques for improving its performance.
Another test is performed by including the swing generator as well for regulating system bus voltages, so that the dimension of action space becomes 3125 (55). The corresponding DQN agent performance is shown in
Furthermore, a larger power network, the Illinois 200-bus system, is used to test the performance of DRL agents. A heavy load area in the Illinois 200-bus system is tested, by using 5 generators for controlling 30 adjacent buses, shown in
The performance of the DRL agent is shown in
To effectively mitigate voltage issues under growing uncertainties in a power grid, this embodiment presents a novel control framework, Grid Mind, to use deep reinforcement learning for providing coordinated autonomous voltage control for grid operation. The architecture design, computational flow and implementation details are provided. The training procedures of DRL agents are discussed in detail. The properly trained Agents can achieve the goal of autonomous voltage control with satisfactory performance. It is important to carefully tune the parameters of the agent and properly set the tradeoff between learning and real-world application.
Although embodiments have been described with reference to specific example embodiments, it will be evident that various modifications and changes can be made to these example embodiments without departing from the broader spirit and scope of the present application. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.
Number | Date | Country | |
---|---|---|---|
62744217 | Oct 2018 | US |