The present disclosure relates to protection circuits for power distribution networks.
A switching voltage surge or transient in a power distribution system is the result of energization or de-energization of transmission or distribution lines and large electrical apparatuses, such as reactors and capacitor banks. These actions can occur in the system due to system configuration changes or faults. During these conditions, the inductive or capacitive loads release or absorb the energy suddenly and generate voltage or current transient. Consequently, voltage surges may occur and, therefore, jeopardize the equipment and personal safety. Specifically, the switching surges usually occur upon the energization of lines, cables, transformers, reactors, or capacitor banks.
Long high-voltage lines store a large amount of energy, which generates many voltage transients in the systems. Capacitance in a transmission line causes current to flow even when no load is connected to the line, which is referred to as line charging current. Underground line capacitance for power cables is far higher as compared to their overhead counterparts due to closeness of the cables and proximity to earth. As a result, underground lines have 20-75 times the line charging current. Thus, cables can trap a high amount of charge. The trapped charge is a residual charge in the line or cable subsequent to de-energization. If the trapped charge has the same polarity as the system voltage, switching overvoltage may be observed.
Reinforcement learning (RL)-based recloser control for distribution cables with degraded insulation level is provided. Utilities continuously observe cable failures on aged cables that have an unknown degraded basic insulation level (BIL). One of the root causes is the transient overvoltage (TOV) associated with circuit breaker reclosing. To solve this problem, researchers have proposed a series of controlled switching methods, most of which use deterministic control schemes. However, in power systems, especially in distribution networks, the switching transient is buffeted by stochasticity. Since it is hard to model TOV due to its complexity, embodiments described herein provide a model-free stochastic control method for reclosers under the existence of uncertainty and noise.
Concretely, to capture high-dimensional dynamics patterns, the recloser control problem is formulated herein by incorporating the temporal sequence reward mechanism into a deep Q-network (DQN). Meanwhile, physical understanding of the problem is embedded into the action probability allocation to develop an infeasible-action-space-elimination algorithm. Through power system computer-aided design (PSCAD) simulation, the impact of load types on cables' TOVs is revealed. Then, to reduce the training burden for the proposed RL control method in different applications, a post-learning knowledge transfer method is established. After validation, several learning curves are exhibited to show the enhanced performance. The learning efficiency is proved to be outstanding due to the proposed time sequence reward mechanism and infeasible action elimination method. Moreover, the results on knowledge transfer demonstrate the capability of method generalization. Finally, a comparison with conventional methods is conducted, which illustrates the proposed method is most effective in mitigating the TOV phenomenon among three methods.
An exemplary embodiment provides a method for recloser control in a power distribution system. The method includes developing an RL-based framework for recloser control in a stochastic environment and controlling a recloser using the developed RL-based framework.
Another exemplary embodiment provides a recloser controller. The recloser controller includes a processing device and a memory comprising a set of instructions which, when executed by the processing device, cause the recloser controller to develop a state, action, and reward of an RL-based framework to mitigate reclosing TOV in a recloser.
Those skilled in the art will appreciate the scope of the present disclosure and realize additional aspects thereof after reading the following detailed description of the preferred embodiments in association with the accompanying drawing figures.
The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the disclosure, and together with the description serve to explain the principles of the disclosure.
The embodiments set forth below represent the necessary information to enable those skilled in the art to practice the embodiments and illustrate the best mode of practicing the embodiments. Upon reading the following description in light of the accompanying drawing figures, those skilled in the art will understand the concepts of the disclosure and will recognize applications of these concepts not particularly addressed herein. It should be understood that these concepts and applications fall within the scope of the disclosure and the accompanying claims.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
It will be understood that when an element such as a layer, region, or substrate is referred to as being “on” or extending “onto” another element, it can be directly on or extend directly onto the other element or intervening elements may also be present. In contrast, when an element is referred to as being “directly on” or extending “directly onto” another element, there are no intervening elements present. Likewise, it will be understood that when an element such as a layer, region, or substrate is referred to as being “over” or extending “over” another element, it can be directly over or extend directly over the other element or intervening elements may also be present. In contrast, when an element is referred to as being “directly over” or extending “directly over” another element, there are no intervening elements present. It will also be understood that when an element is referred to as being “connected” or “coupled” to another element, it can be directly connected or coupled to the other element or intervening elements may be present. In contrast, when an element is referred to as being “directly connected” or “directly coupled” to another element, there are no intervening elements present.
Relative terms such as “below” or “above” or “upper” or “lower” or “horizontal” or “vertical” may be used herein to describe a relationship of one element, layer, or region to another element, layer, or region as illustrated in the Figures. It will be understood that these terms and those discussed above are intended to encompass different orientations of the device in addition to the orientation depicted in the Figures.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,” “includes,” and/or “including” when used herein specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs. It will be further understood that terms used herein should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Reinforcement learning (RL)-based recloser control for distribution cables with degraded insulation level is provided. Utilities continuously observe cable failures on aged cables that have an unknown degraded basic insulation level (BIL). One of the root causes is the transient overvoltage (TOV) associated with circuit breaker reclosing. To solve this problem, researchers have proposed a series of controlled switching methods, most of which use deterministic control schemes. However, in power systems, especially in distribution networks, the switching transient is buffeted by stochasticity. Since it is hard to model TOV due to its complexity, embodiments described herein provide a model-free stochastic control method for reclosers under the existence of uncertainty and noise.
Concretely, to capture high-dimensional dynamics patterns, the recloser control problem is formulated herein by incorporating the temporal sequence reward mechanism into a deep Q-network (DQN). Meanwhile, physical understanding of the problem is embedded into the action probability allocation to develop an infeasible-action-space-elimination algorithm. Through power system computer-aided design (PSCAD) simulation, the impact of load types on cables' TOVs is revealed. Then, to reduce the training burden for the proposed RL control method in different applications, a post-learning knowledge transfer method is established. After validation, several learning curves are exhibited to show the enhanced performance. The learning efficiency is proved to be outstanding due to the proposed time sequence reward mechanism and infeasible action elimination method. Moreover, the results on knowledge transfer demonstrate the capability of method generalization. Finally, a comparison with conventional methods is conducted, which illustrates the proposed method is most effective in mitigating the TOV phenomenon among three methods.
I. Introduction
As described above, most aged cables in power distribution systems have unknown and degraded BIL, causing frequent cable failures in modern smart grids. Most utilities probably do not reclose into faults on underground systems, as faults in underground systems are considered permanent. One aspect of the present disclosure investigates what damaging effects reclosing into underground faults may produce and provides arguments to change this practice. Therefore, the effects of reclosing (in particular, the resulting overvoltage phenomenon in distribution systems) are investigated for the practical consideration of eliminating the occurrence of cable failure.
To achieve the above target, a test on a real feeder (e.g., distribution line) is an unviable solution since the customers downstream will go through a power outage. Therefore, computer simulation of the field tests is developed to study the transient electromagnetic phenomena. Real-time system parameters and measurements are required to prepare system models and perform an exact transient study. This is very useful to identify available voltage surge, determine the equipment insulation coordination, and select protective equipment operating characteristic. However, it is essential to consider the peak over-voltage discrepancy between the frequency-based simulation model results and real-time field measurements.
The power industry has witnessed the evolution of surge arresters from air gap and silicon carbide types to metal oxide varistors (MOV). In extra-high voltage applications, MOV and a breaker with closing resistors are two basic methods to restrict switching surges. In high voltage transmission systems, switching surges are destructive to electrical equipment, so surge arresters are typically installed near large transformers and on line terminals to suppress surges. In medium and low voltage levels, as the penetration of distributed energy resources gets deeper, it is still not clear whether the arresters are a viable solution. One thing is clear: it is not economical to place surge arresters all over the distribution networks due to their vast reaches. Besides surge arresters, other devices used to limit switching overvoltage include pre-insertion resistors and magnetic voltage transformers.
In addition to the device-based method, controlled switching belongs to the second category of overvoltage mitigation methods. The core of controlled switching is statistical switching, where the worst-case scenarios are determined through several dimensions of overvoltage scenarios. Statistical switching has been adopted for decades. Investigated scenarios include switching speed, actual operating capacity, load and line length, etc.
Unlike conventional controlled switching methods that rely on deterministic control, embodiments described herein view controlled switching as a stochastic control. In a deterministic model, the future state is theoretically predictable. Thus, most researchers investigate the statistical switching overvoltage distributions for different switching operations, and then design the control according to the observation. However, in power systems, especially in distribution networks, the switching transient is buffeted by stochasticity. A stochastic model is needed to possess inherent randomness and uncertainty.
Unfortunately, relatively little has been done to develop a stochastic control mechanism that views the complexity of the control task as a Markov decision process (MDP). Since it is hard to assume knowledge or cost function of the overvoltage dynamics, it is desirable to combine the advantages of off-policy control and value function approximation. Meanwhile, given the high-dimensional dynamic complexity of power systems, a deep RL method is redesigned to improve the control performance. Therefore, after the validation, a recloser control method using DQNs is proposed.
Some features of embodiments described herein are summarized below:
Section II provides a discussion of the reclosing impact on underground cables. The proposed recloser control method using RL is elaborated in Section III. Section IV shows numerical results, followed by discussions in Section VI. A computer system for implementing at least some aspects of the present disclosure is described in Section VII.
II. Reclosing Impact on Cables Via PSCAD
As mentioned earlier, one of the reasons for the failure of cable is TOVs. TOV can arise from the supply or from switching inductive loads, harmonic currents, DC feedback, mutual inductance, high-frequency oscillations, large starting currents, and large fluctuating loads. TOV or surges are temporary high magnitude voltage peaks for a short duration of time (e.g., lightning). Switching transients in electrical networks often occurs. Although the voltage magnitude is lower than the lightning surge, the frequency at which it occurs causes aging of cable insulation and eventually breaks down resulting in flashover. To observe the TOVs in computer programs, a 750 MCM-AL cable is used, which is widely implemented in many systems. This section focuses on the modeling of switching and power systems.
A. Switching Modeling
For the switching modeling, statistical breakers in PSCAD are used to account for the physical metal contact and the issue of pole span. Pole span is the time span between the closing instant of the first and the last pole. The single-pole operation of three-phase breaker is applied to incorporate the angle difference in the operation of different poles because of the mechanical inconsistencies. The resulting TOVs upon 100 simulations of different sets of circuit breaker closing times with a standard deviation of 4 in the half interval are shown in Table I. This table brings some flavors on how the pole span contributes to the maximum TOVs. One can refer to Section IV-A for the system parameters.
When the switching occurs at other angles, different TOVs are obtained. Although this does not demonstrate all the cases with higher TOVs, it shows that the optimal controlled switching time is crucial to TOV mitigation under the current switching modeling. It is noteworthy that the limitation of the adopted switch modeling is imperfect, the details of which can be found in Section VI. Meanwhile, it is evident that over-voltages frequently occur on cables; therefore, it is imperative to provide a solution that lowers the probability of cable failure.
B. Power System Modeling
Firstly, two different line models, namely, distributed line model and frequency-dependent π model, are employed to capture different aspects of cable characteristics.
With reference to
III. RL-Based Recloser Control Method
It is important to select an RL method that is suitable for the particular problem under study. In general, RL is classified into model-based (MB) and model-free (MF). In MB RL, the classical World model is chosen as an example. Since it is MB, an environmental model is needed during learning. However, given the complexity of the TOV problem under study, it is hard to construct an internal model of the transitions and immediate outcomes for recloser control. For this reason, some embodiments do not use MB RL. In MF RL algorithms, the agent relies on trial-and-error experience to reach the optimal policy. The typical methods include policy optimization and Q-learning. Under the policy optimization approach, the popular policy gradient (PG) is selected as a comparison.
In contrast, for Q-learning methods the basic version and the advanced version DQN are chosen. Please note that this paper utilizes DQN method for RL control. The main advantage of DQN over PG is that it involves discrete action space, while PG is for continuous action spaces. it is desirable to reduce the action space. However, PG method will consider 0, 1 and anything in between, whereas a breaker can have precisely two discrete actions (Off and On). Therefore, owing to the discrete nature of the action space involved in Q-learning-based RL, it is perhaps the best choice to reduce the computational burden. For the selection between Q-learning and DQN, certain embodiments use DQN due to its powerful value function approximation capability in multiple power system scenarios. The above comparison of selecting the RL methods is summarized in Table II, where bold and underlined text indicates the main reason for why this method has not been selected.
The remaining part of this section starts with the impetus of choosing the DQN algorithm, which is capable of dealing with the continuous status space of the recloser observation. To control the reclosers, the design of temporal sequence reward mechanism, infeasible action space elimination algorithm, and the post-learning knowledge transfer method are elaborated.
A. The Deep Q-Network (DQN) for Better Value Approximation
The task of TOV mitigation requires a model-free control algorithm that finds an optimal strategy for solving a dynamical control problem. Obviously, RL is a suitable solution. Among various types of RL algorithms, the off-policy control where the agent usually uses a greedy policy to select actions can be incorporated with the action value estimation design. Therefore, Q-learning is chosen to satisfy this requirement. Based on the complexity of the electric grids, the value-based DQN method needs to involve intensive use of simulation for the parametric approximation. To enable self-learning of the recloser control, an actor-critic system is adopted to estimate the rewards. The critic in this system evaluates the value function, and the actor is the algorithm that improves the obtained value. DQN agents use the following training algorithm, in which they update their critic model at each time step. First, the critic Q(s, a) needs to be initialized with random parameter values θQ, and initialize the target critic with the target update smoothing method. Then, at each time step:
θQ′=τθQ+(1−τ)θQ′ Equation 4
B. Temporal Sequence Reward to Guarantee Learning Quality
To develop a DQN to mitigate TOVs, its state design is first considered. For each phase p∈{A, B, C}, there are voltage and current measurements from the bus located downstream of the breaker under study. Similar to a conventional recloser, the magnitudes of voltage |Vp| and current |Ip| along with the voltage phase angle θV
s=[|Vp|,θV
After defining the state, the action space of the controlling system is defined that suits the system and can deliver the best results. Practically, the opening of the recloser is usually triggered by faults and subsequent to the series of pre-defined sequence. Electronically controlled reclosers are usually set to trip two to three times, using a combination of fast and slow time-current curves. It is assumed that the opening of reclosers is taken care of by the conventional fault detection method and the pre-defined sequence. Thus, due to the simplicity of the control task, a binary action space a∈{0,1} is selected. Here, 0 indicates that no reclosing is required, whereas 1 indicates there is a reclosing action. It is necessary to remind the reader that there is an essential dimension of the action—time, which is the key to a successful reclosing.
Since the RL control agent learns through its special “feedback”-reward to improve its performance, it is important to design the reward mechanism that captures the key task sequence and maximizes its accumulative reward from the initial state to the terminal state (one episode). Therefore, a reward function is designed that makes the agent learn the optimal time to reclose in the continuous state space. To achieve that, the reward function should evaluate the voltage deviation upon reclosing and consider the reclosing dead time. Consequently, for each time step t and the jth agent:
R
tovi,t
j
=α−β·B
RisingEdge·[|Vp,t|−Vref,t]30−ζ·[tS
where α, β, and ζ are adjustable scaling factors. Their values are adjustable in a specific case. The value of a determines the highest attainable reward. BRisingEdge is the signal bit that becomes high only when it captures the rising edge of the recloser j's status (changes from open (0) to close (1)). While tS
Furthermore, it is beneficial to have a reward that evaluates the overall performance at the end of the episode. Thereby, the end of the episode reward Reej is designed:
R
ee
j=−θ·[NReclose−Npre-defined]+ Equation 7
where θ is a scaling factor, NReclose and Npre-defined are the number of reclosing over one episode and the pre-defined number of tripping programmed in the recloser. Thus, the reward function in one episode becomes:
R
j=Σt=1TRtovi,tj−Reej Equation 8
C. Infeasible Action Space Elimination for Fast Learning
With the time dimension considered, the action space is immense. To have a working algorithm, it is necessary to remove most of the infeasible action space to make sure of the performance and efficiency. A generalized DQN algorithm usually solves problems or games that do not contain the time dimension. However, in this particular issue, after investigating the DQN algorithm in Section III-A, the time dimension is introduced to embed the physical law into the algorithm—eliminating the physically infeasible region and enhancing the exploitation in the physically feasible region. It makes sure that the probability of action, according to the time sequence, can be pushed up from a state if this action is better than the value of what should occur from that state. The probability ε in Section III-A is now redefined as follows:
where ε0(t) denotes the base exploration rate, which is a function of time. tri denotes the pre-defined opening time of sequences. f is the grid frequency and n/f confines the exploration within n cycles. The agent's timer is on as long as a fault is detected.
Traditionally, the agent explores the action space from the first time-step to the last one. Whereas this is not necessary most of the time if the agent wants to achieve a reduced resulting TOV. For instance, the actions taken before the fault or in between two pre-defined trips are dispensable. Therefore, the notion of restricting the exploration to the time sequences where the action is required is conceived. Such a prior domain knowledge can help to gain higher rewards even in the initial few episodes. Hence, the temporal reward design is aligned with the temporal action likelihood. Assuming P(at) as the prior distribution for the possible actions
where P*(at) is the probability distribution of taking possible actions for the appropriate time sequences where exploration is needed. Such a formulation incorporates physically feasible interpretation into the model's MDP probability change. For a breaker control problem, the probabilities of having specific control actions may impact the performance mainly by restricting the exploration to a suitable temporal region and selecting appropriate probabilities of on or off actions for the breaker. So, an extensive analysis can be performed to show what probability distributions are reasonable. This begins by selecting off and on status completely randomly, i.e., both with 0.5 probability. Then the probability of occurrence of status on continues to increase since the breaker is expected to remain on for a greater number of steps once it is reclosed. The pseudo-code is shown in Algorithm 1.
D. Post-Learning Knowledge Transfer
The transferability of RL and other machine learning control methods is sometimes questioned by researchers, since, unlike deterministic control, machine learning control needs to tune its parameters based on case-specific training. This is not efficient. To overcome this issue, an adopted approach involves fitting a polynomial line Rf∈n 2 Rn, where n is the degree of the polynomial, with reward parameters using an evaluation reward R(Si, Ai). The degree of the polynomial is a hyperparameter which affects the speed of training:
R
f=θ0+θ1R(S1,A1)+θ2R2(S2,A2)+ . . . +θnRn(Sn,An) Equation 11
where θi is the coefficient of the ith polynomial term. Such a polynomial function can be fitted through least square-based regression.
IV. Numerical Results
A. Benchmark System
With the benchmark model, the impact of different load types on TOVs is first evaluated. As shown in Table V, the load types of C and LC are two significant causes of cable TOVs. They are, in reality, the capacitor bank 24 and the inductive loads 26, including transformer connected to the cable 22. With a decreased L or increased C, the maximum TOVs tend to increase, since the load becomes more and more capacitive in nature. Furthermore, for only load type C, the highest maximum TOVs are observed, because upon reclosure the voltages are held at high values by the charged capacitor and there is no alternate route to discharge. The results also indicate that a resistive load 28 serves as the drain of the trapped charge in the cable; therefore, the TOVs are hardly observed.
Additionally, the TOVs have large deviations when switching off the loads due to possible restrikes. Therefore, this can also be one of the reasons for causing detrimental TOVs. To study such a phenomenon of load switching due to restrikes and develop a deeper insight into the matter, the switching scenarios are expanded with rigorous experimentation to identify the highest TOV values upon multiple restrikes. Results are presented in Table VI. This analysis shows that there is a high TOV when the load is shed without losing capacitor banks. The controller can also be designed to mitigate such TOVs.
On
On
On
Off
Off
On
1.54
B. Overall Learning Curve by Using the Temporal Sequence Reward Mechanism and Hyper-Parameter Selection
1. Discounting Factor (γ)
2. Epsilon (ε)
3. Decay Rate
4. Smoothing Factor (τ)
5. Experience Buffer () with Capacity N
6. Minimum Batch Size (M)
The proposed learning agent is trained for different X/R ratios of the source, which impacts the cumulative reward obtained by the learning agent. The X/R ratio is varied from 8 to 20 with a step size of 4, keeping into consideration the realistic X/R ratios in a distribution network. The maximum peak TOV can reach up to 2.2 pu when the X/R ratio equals to 12. The results of mean, maximum, and standard deviation (Std.) of the reward vectors upon complete training for each sample are tabulated in Table VII. The results indicate a high mean and maximum reward in all cases. Interestingly, an X/R ratio of around 12 for the system under discourse gives the highest mean and maximum reward values with the least standard deviation.
C. Fast Learning Curve with Infeasible Region Eliminated
It is proposed to eliminate the region where a particular action is infeasible from the exploration by implementing a carefully designed varying probability approach.
D. Efficient Knowledge Transfer for Method Generalization
Some embodiments aim to boost the learning process further to make the proposed method adaptive and general. There are multiple time sequences need to be learned by the model. Table VIII illustrates that a flat start model without knowledge transfer requires many numbers of episodes to gain an average reward higher than 0.7. To ameliorate such a situation, the knowledge transfer method described herein helps to reduce the number of episodes, since it has the capability of retaining the reward information from the past time sequences. With transfer learning, 261 episodes are required to gain a reward above the normalized reward of 0.7, as compared to 394 episodes with the approach of flat start, when the model is learning on the first two time sequences. For the first three time sequences, 289 episodes are required in comparison to 682 episodes. Hence, such a method of transferring reward knowledge supports the training time reduction significantly. Moreover, the generalization of reward parameters also helps in systems with other configurations to enable the reward knowledge transfer.
V. Performance Comparison with Other Methods
The temporal sequence based RL technique provides a framework to learn optimal breaker reclosure time that helps ameliorate the TOV. There have been efforts in the past to accomplish such a task. One traditional method is to reclose whenever the source side voltage crosses zero value. This zero-crossing method is easy to implement in a recloser but not effective. Therefore, the proposed method is compared with another controlled switching scheme as described in H. Seyedi and S. Tanhaeidilmaghani, “New Controlled Switching Approach for Limitation of Transmission Line Switching Overvoltages,” in IET Generation, Transmission Distribution, vol. 7, no. 3, pp. 218-225, 2013. This scheme is referred to herein as a method of half of the peak voltage, because its closing operation is performed at the instant of +Vmax/2 of the source side voltage if the polarity of trapped charge is positive, and at the instant of −Vmax/2 if the polarity of trapped charge is positive negative. Interested readers can refer to the cited paper to understand the mathematical formulation and the advantage of this application.
VI. Discussions
There are very challenging issues in TOV modeling. Realistic TOV should consider the modeling of restrikes/prestrikes, capacitive current, inductive current switching, the structure of the network, system parameters, whether or not virtual chopping takes place, chopping current, the instant of opening, and resonance phenomena, etc.
Although it relates to transmission, TOV calculations are used to determine minimum approach distance (MAD) for the work rules required by the National Electrical Safety Code (NESC) and OSHA (1910.269(I)(3)(ii)). TOV is also dependent on line design and operation. The work in S. Surges, “Switching Surges: Part IV-Control and Reduction on AC Transmission Lines,” IEEE Transactions on Power Apparatus and Systems, no. 8, pp. 2694-2702,1982, provides guidance on TOV factors and methods for control. OSHA 1926 Table 5 in Appendix A to Subpart V, provides TOV values based on various causes.
Restrikes can influence TOV, but the industry generally believes that proper periodic breaker maintenance limits the likelihood of restrikes. The periodic maintenance of distribution system breakers is assumed to have a similar effect. Meanwhile, capacitor switching may have restrikes, but embodiments described herein focus on feeder breaking reclosing while attempting to clear a fault. During this time, the state of a switched capacitor bank remains unchanged, as well as any other devices connected to this circuit. It is also assumed that a circuit under a fault condition is not lightly loaded.
Limitations exist in TOV modeling, but this disclosure has demonstrated an innovative learning method that controls the reclosing under a spectral of system complexity. It relies on reinforcement learning to explore the complicated state space in a model-free way, no matter what the restrike/prestrike model is, what the structures of the network are, what the system parameters are, and whether an additional preventive device is added. Promising results are shown in the numerical section.
VII. Process for Recloser Control in a Power Distribution System
Operation 1200 optionally continues at operation 1206, with developing an action of the RL-based framework. Operation 1200 optionally continues at operation 1208, with eliminating an exploration region where the action is infeasible. Operation 1200 optionally continues at operation 1210, with transferring reward knowledge from a first power system configuration to a second power system configuration.
After operation 1200, the process continues at operation 1212, with controlling a recloser using the developed RL-based framework.
Although the operations of
VIII. Computer System
The exemplary computer system 1300 in this embodiment includes a processing device 1302 or processor, a system memory 1304, and a system bus 1306. The system memory 1304 may include non-volatile memory 1308 and volatile memory 1310. The non-volatile memory 1308 may include read-only memory (ROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), and the like. The volatile memory 1310 generally includes random-access memory (RAM) (e.g., dynamic random-access memory (DRAM), such as synchronous DRAM (SDRAM)). A basic input/output system (BIOS) 1312 may be stored in the non-volatile memory 1308 and can include the basic routines that help to transfer information between elements within the computer system 1300.
The system bus 1306 provides an interface for system components including, but not limited to, the system memory 1304 and the processing device 1302. The system bus 1306 may be any of several types of bus structures that may further interconnect to a memory bus (with or without a memory controller), a peripheral bus, and/or a local bus using any of a variety of commercially available bus architectures.
The processing device 1302 represents one or more commercially available or proprietary general-purpose processing devices, such as a microprocessor, central processing unit (CPU), or the like. More particularly, the processing device 1302 may be a complex instruction set computing (CISC) microprocessor, a reduced instruction set computing (RISC) microprocessor, a very long instruction word (VLIW) microprocessor, a processor implementing other instruction sets, or other processors implementing a combination of instruction sets. The processing device 1302 is configured to execute processing logic instructions for performing the operations and steps discussed herein.
In this regard, the various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with the processing device 1302, which may be a microprocessor, field programmable gate array (FPGA), a digital signal processor (DSP), an application-specific integrated circuit (ASIC), or other programmable logic device, a discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. Furthermore, the processing device 1302 may be a microprocessor, or may be any conventional processor, controller, microcontroller, or state machine. The processing device 1302 may also be implemented as a combination of computing devices (e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration).
The computer system 1300 may further include or be coupled to a non-transitory computer-readable storage medium, such as a storage device 1314, which may represent an internal or external hard disk drive (HDD), flash memory, or the like. The storage device 1314 and other drives associated with computer-readable media and computer-usable media may provide non-volatile storage of data, data structures, computer-executable instructions, and the like. Although the description of computer-readable media above refers to an HDD, it should be appreciated that other types of media that are readable by a computer, such as optical disks, magnetic cassettes, flash memory cards, cartridges, and the like, may also be used in the operating environment, and, further, that any such media may contain computer-executable instructions for performing novel methods of the disclosed embodiments.
An operating system 1316 and any number of program modules 1318 or other applications can be stored in the volatile memory 1310, wherein the program modules 1318 represent a wide array of computer-executable instructions corresponding to programs, applications, functions, and the like that may implement the functionality described herein in whole or in part, such as through instructions 1320 on the processing device 1302. The program modules 1318 may also reside on the storage mechanism provided by the storage device 1314. As such, all or a portion of the functionality described herein may be implemented as a computer program product stored on a transitory or non-transitory computer-usable or computer-readable storage medium, such as the storage device 1314, volatile memory 1310, non-volatile memory 1308, instructions 1320, and the like. The computer program product includes complex programming instructions, such as complex computer-readable program code, to cause the processing device 1302 to carry out the steps necessary to implement the functions described herein.
An operator, such as the user, may also be able to enter one or more configuration commands to the computer system 1300 through a keyboard, a pointing device such as a mouse, or a touch-sensitive surface, such as the display device, via an input device interface 1322 or remotely through a web interface, terminal program, or the like via a communication interface 1324. The communication interface 1324 may be wired or wireless and facilitate communications with any number of devices via a communications network in a direct or indirect fashion. An output device, such as a display device, can be coupled to the system bus 1306 and driven by a video port 1326. Additional inputs and outputs to the computer system 1300 may be provided through the system bus 1306 as appropriate to implement embodiments described herein.
The operational steps described in any of the exemplary embodiments herein are described to provide examples and discussion. The operations described may be performed in numerous different sequences other than the illustrated sequences. Furthermore, operations described in a single operational step may actually be performed in a number of different steps. Additionally, one or more operational steps discussed in the exemplary embodiments may be combined.
Those skilled in the art will recognize improvements and modifications to the preferred embodiments of the present disclosure. All such improvements and modifications are considered within the scope of the concepts disclosed herein and the claims that follow.
This application claims the benefit of provisional patent application Ser. No. 63/105,629, filed Oct. 26, 2020, the disclosure of which is hereby incorporated herein by reference in its entirety.
This invention was made with government funds under 1810537 awarded by the National Science Foundation. The U.S. Government may have rights in this invention.
Number | Date | Country | |
---|---|---|---|
63105629 | Oct 2020 | US |