The present invention relates to management of energy storage systems.
There are more and more PV resources being connected to the conventional weak distribution systems, however, the power generation from renewables greatly depends on the weather condition, which is unpredictable, and variable. The high ramp rate variations of the PV power output will bring significant voltage fluctuations. Severe ramp rate may even cause system stability issues. Therefore, ramp rate control strategies to reduce fluctuations in PV outputs are necessary in order to increase the PV penetration level in the networks.
The integration of energy storage devices with PV system is an effective way to smooth PV output, e.g. the battery energy storage. The ramp rate of PV generation output can be limited by charging and discharging battery storage system. Considering the limited power/energy capacity, the limited life cycles of battery storage devices, unpredictable power generation, dynamic operation environment, and an effective control method of battery storage is required to limit the PV ramp rate.
There are basically two ways for PV ramp rate control, one is energy storage-aided, the other is without energy storage integrated with PV. For those control approaches without energy storage involved, such as inverter-based control approach which curtails the PV output during PV ramping-up events. It has been investigated in literatures that these curtailment approaches lead to a direct energy loss or profit loss; meanwhile, the inverter-based power curtailment approach only works for ramp up events. For the ramp down event, storage devices or other reserve services are still needed to provide supporting power supply.
For those energy storage-based control approaches, most studies use moving average filter or other low-pass filter to control battery operation. The filter-based method can reduce PV output fluctuations, however it may not necessarily control the PV output to a desired ramp rate. And the choice of the moving filter time window will affect the battery operation. For example, the short time windows may be insufficient to counteract high ramp rates, while large time windows may introduce excessive utilization of battery. In some studies the battery SoC is feedback into control loop to maintain the battery energy capacity in range. However, the battery operation is still not being optimized in a way that the required battery capacity is minimized and the life is possibly extended.
In one aspect, systems and methods are disclosed for storing photovoltaic (PV) generation by applying reinforcement learning (RL)-based control to battery storages for PV ramp rate control; and exchanging energy dynamically to limit a ramp rate of the PV power output and maintaining a battery state of charge level at a predefined level to minimize required battery size and extend the battery life cycles.
In another aspect, a ramp rate control method includes a reinforcement learning (RL)-based control framework of battery storage system for PV ramp rate control. In RL, the problem can be modeled as the interaction between an objective-oriented controller and an environment with uncertainty. This approach does not require known PV power profiles. Through predetermined control objectives, the PV ramp rate can be directly constrained within limit, meanwhile avoid excessive utilization of battery. So a multi-control objective is constructed to include the success of suppression of PV power ramp rate and minimization of the deviation of battery capacity from pre-defined setting point.
As one type of RL, the Q-learning technique is applied towards the optimal control of battery storage which maximizes the total rewards during the system operation. The reward function is constructed in the way that the above control objectives are minimized.
Advantages of the system may include improved battery operation in that the required battery capacity is minimized and the life is extended. The control framework optimizes battery energy storage for PV ramp rate control. The control approach is able to manage the battery SoC level optimally in order to minimize the required battery capacity, extend the battery life cycles. Other advantages may include one or more of the following:
The target system (PV integrated with Battery storage system) is shown in
P
dc
=P
pv
+P
be (1)
As shown in
The desired ramp-rate of Pdc is defined as the maximum allowable ramp rate (MARR). The MARR could be defined in different units, e.g. W/sec, kW/min.
The ramp rate of Pdc can be described as:
Assume the sampling time interval is Δt, Eq. (2) can written as
So that the ramp rate should satisfy
To illustrate the instant ramp rate control method, an illustrative PV power ramp down and the corresponding compensating battery power is shown in
There are several variables which need to be defined or optimized during the ramping control process:
Among those variables, the ramp rate or power change of BE power determines the ramp rate of integrated DC power output.
The battery operation policy can be optimized considering the following two objectives:
The multi-objective functions are described as:
Where α2, α1 are the weight coefficients.
The following operation constraints needs to be satisfied:
|RRdc(t)|≦MARR
E
be,min
≦E
be(t)≦Ebe,max
P
be,min
≦P
be(t)≦Pbe,max
At each time instant t, when the PV power output fluctuates (ΔPpv(t), so is the exported DC power ΔP′dc(t) when the battery power output (P′be(t)) is kept the same as previous time step (t−1). Based the above known conditions, the battery power will be adjusted to minimize the objectives in (4) while subject to the above constraints. The online management flowchart at each time instant t is shown in
As shown in
Next, the Reinforcement Learning-based optimization approach is detailed.
There are three elements in RL techniques: state space S, action set A, and reward functions R, the reward R is a function of S and A. There are defined as follows:
The reward function is defined in a similar way as the objectives in (4). The R is defined in this way so that the energy drawn from battery and the ramp rate of exported DC power is minimized through maximizing the reward value R.
As one of the RL techniques, the Q-learning is used to find the optimal battery operation sequence which maximizes the total rewards. Q-learning uses temporal differences to estimate Q value of each state-action pair Q*(s,a). Q*(s,a) is the expected value of taking action a in state s and following the optimal policy thereafter, where the expected value means the cumulative discounted reward such as:
Where γ is the discount factor between 0 and 1. The γ reflects how much of the future rewards are counted into total value compared with the immediate rewards. One of the advantages of Q-learning is that it does not require a model of the environment.
The action-value set Q(s,a) is learned and updated along system operation, the optimal action can determined by selecting the action with the highest Q value in each state. The update of Q(s,a) is value iteration update defined as:
Where Rt+1 is the reward after performing at in state st, at(st, at) is the learning rate, it could be a constant value for all state-action pair, or it varies with the state-action pair. γ is the discount factor between 0 and 1. The γ reflects how much of the future rewards are counted into total value compared with the immediate rewards.
At the beginning of the Q learning, the initial value of Q for all state-action pairs can be set arbitrarily and updated iteratively later. The Q-learning procedure is illustrated in the flowchart in
There are different policies for the action selection. The choice of these policies aims the trade-off between the exploitation and exploration phase during system operation. For example, ε-greedy policy can be chosen for the action selection during exploration phase, where the action with highest Q value is selected with probability 1−ε and the rest of the time a random action is chosen uniformly.
Mode definition is discussed next. The state-action pairs (st, at) are discretely defined. The discrete modes are defined as follows.
The number of state-action pair modes can be chosen based on the system computation capability, the required control operation rate.
Different from the techniques used in prior art, such as low-pass filter-based approach, power curtailment, the system applies reinforcement learning-based control approach of battery storages for PV ramp rate control, which is new. The storage operation is decided dynamically to limit ramp rate of the PV power output, meanwhile the battery SoC level is maintained around predefined level to minimize the required battery size and extend the battery life cycles. This optimization-based approach does not need PV power profiles known, and can adjust the battery operation to different PV generation profiles.
A first storage device 122 and a second storage device 124 are operatively coupled to system bus 102 by the I/O adapter 120. The storage devices 122 and 124 can be any of a disk storage device (e.g., a magnetic or optical disk storage device), a solid state magnetic device, and so forth. The storage devices 122 and 124 can be the same type of storage device or different types of storage devices. A speaker 132 is operatively coupled to system bus 102 by the sound adapter 130. A transceiver 142 is operatively coupled to system bus 102 by network adapter 140. A display device 162 is operatively coupled to system bus 102 by display adapter 160. A first user input device 152, a second user input device 154, and a third user input device 156 are operatively coupled to system bus 102 by user interface adapter 150. The user input devices 152, 154, and 156 can be any of a keyboard, a mouse, a keypad, an image capture device, a motion sensing device, a microphone, a device incorporating the functionality of at least two of the preceding devices, and so forth. Of course, other types of input devices can also be used, while maintaining the spirit of the present principles. The user input devices 152, 154, and 156 can be the same type of user input device or different types of user input devices. The user input devices 152, 154, and 156 are used to input and output information to and from system 100.
Of course, the processing system 100 may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements. For example, various other input devices and/or output devices can be included in processing system 100, depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art. For example, various types of wireless and/or wired input and/or output devices can be used. Moreover, additional processors, controllers, memories, and so forth, in various configurations can also be utilized as readily appreciated by one of ordinary skill in the art. These and other variations of the processing system 100 are readily contemplated by one of ordinary skill in the art given the teachings of the present principles provided herein.
It should be understood that embodiments described herein may be entirely hardware, or may include both hardware and software elements which includes, but is not limited to, firmware, resident software, microcode, etc.
Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. The medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
A data processing system suitable for storing and/or executing program code may include at least one processor, e.g., a hardware processor, coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers.
The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the principles of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention.
The present application claims priority to Provisional Application 62/246,801, the content of which is incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62246801 | Oct 2015 | US |