The present invention relates to the field of power adjustment, and in particular, to a method for intelligently adjusting a power flow based on a Q-learning algorithm.
At present, in order to adjust a power flow, an operator usually adjusts the adjustable load power output based on experience to adjust a non-convergent power flow to a convergent power flow. In this method, adjustment is manually conducted completely based on the experience of the operator. Therefore, the method is random, and is inefficient and ineffective. In addition, the method imposes extremely high theoretical and practical requirements for the operator.
In view of this, the present invention provides a strategy for intelligently adjusting a power flow based on a Q-learning algorithm. Through continuous trying and rule learning, a best unit combination is selected, and a non-convergent power flow is adjusted to a convergent power flow. This minimizes the loss of a power grid, overcomes blindness of a conventional method relying on human experience, and improves efficiency and accuracy of power flow adjustment.
The present invention provides a method for intelligently adjusting a power flow based on a Q-learning algorithm. The method includes:
Optionally, the intelligent adjustment method includes:
Optionally, the intelligent adjustment method includes:
The beneficial effects of the present invention include:
To describe the technical solutions in the embodiments of this application more clearly, the accompanying drawings required to describe the embodiments are briefly described below. Apparently, the accompanying drawings described below are only some embodiments of the present invention. A person of ordinary skill in the art may further obtain other accompanying drawings based on these accompanying drawings without creative effort.
The present invention is further described below with reference to the accompanying drawings.
A method for adjusting a power flow based on a Q-learning algorithm mainly includes the following seven steps:
When a certain state is reached, selecting a certain action may not change the state. For example, unit 1 is on. If the action “powering on unit 1” is selected, the state remains unchanged, and the action is meaningless. To reduce unnecessary repetition and save running time, each time a state is reached, an action corresponding to the unit state is removed from the action space. Remaining actions form an action subspace. An action is selected only from the action subspace.
For a power system including N units, a procedure of adjusting a power flow based on a Q-learning algorithm is shown in
where
Through continuous trying and rule learning, the best unit combination may be selected, and a non-convergent power flow may be adjusted to a convergent power flow. This may minimize the loss of the power grid, overcome blindness of a conventional method relying on human experience, and improve efficiency and accuracy of power flow adjustment.
In the strategy, the variable, action, and goal in the power grid are converted to the state, action, and reward in the algorithm by using the Q-learning method based on a reinforcement learning theory. The unit output remains unchanged. The best unit combination is selected. The non-convergent power flow is adjusted to the convergent power flow. In addition, the loss of the power grid is minimized, and the efficiency and accuracy of power flow adjustment are improved.
First, in the strategy, the variable, action, and goal in the power grid are converted to the state, action, and reward in the algorithm by using the Q-learning algorithm based on the reinforcement learning theory, which delivers great flexibility.
Secondly, unit power output is usually fixed in actual projects, and only a combination of powered-on units is adjusted. In this specification, the best combination of powered-on units is selected by powering on or off units without adjusting the unit power output. This delivers strong practicability.
Finally, power flow adjustment is conducted for the IEEE 39-bus standard test system and a certain actually operating system.
The content described in the embodiments of this specification is merely an enumeration of the implementations of the inventive concept, and the claimed scope of the present invention should not be construed as being limited to the specific forms stated in the embodiments. Equivalent technical means that come into the minds of a person skilled in the art in accordance with the inventive concept also fall within the claimed scope of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
201911123269.8 | Nov 2019 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2020/120259 | 10/11/2020 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2021/093493 | 5/20/2021 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
20200119556 | Shi | Apr 2020 | A1 |
Number | Date | Country |
---|---|---|
108321795 | Jul 2018 | CN |
109511277 | Mar 2019 | CN |
Entry |
---|
Huang et al, over “Adaptive Power System Emergency Control using Deep Reinforcement Learning” arXiv:1903.03712v2 [cs.LG] Apr. 22, 2019 accessed at https://arxiv.org/abs/1903.03712 12 pgs. (Year: 2019). |
Andra, “Reinforcement Learning Approach to Product Allocation and Storage” Masters Thesis, Northeastern University, May 2010, 61 Pgs (Year: 2010). |
Sutton et al., “Reinforcement Learning: An Introduction” 2nd ed MT Press, 2018, 548 Pgs., (Year: 2018). |
Li, “Deep Reinforcement Learning: An Overview” Nov. 26, 2018, 85 Pgs., accessed at https://arxiv.org/abs/1701.07274 (Year: 2018). |
Xibing Hu, The Research of Optimal Power Flow Based on Reainforcement Learning, Chinese Master's Theses Full- text Database, Engineering Science and Technology II, Dec. 15, 2011, pp. 1-19. |
Number | Date | Country | |
---|---|---|---|
20210367426 A1 | Nov 2021 | US |