The present invention relates to a yaw misalignment control method for maximizing wind turbine power production through a yaw misalignment calibration value prediction model based on the Kalman filter and deep reinforcement learning. Specifically, the present invention relates to a yaw misalignment control method to predict flow deflection angles caused by wake using free stream wind speeds and rotor rotation speeds received from a wind turbine through a Kalman filter and recurrent neural network based sequence flow deflection angle prediction model and to estimate and calibrate yaw misalignment with relative wind direction values obtained from wind turbine operation data, assembly angles calculated via the Kalman filter, and flow defection angles predicted by an actor-critic flow deflection angle prediction deep reinforcement learning model which is a self-learning model to maximize the power production of a wind turbine, using produced power as a reward value during the normal operation of a wind turbine.
In the case of a horizontal axis wind turbine generating output power by rotating the rotor to a direction facing the wind direction as shown in [
As shown in [
[1] and [2] are relevant to a method to reduce the cost of installing and operating a nacelle-based lidar in all wind turbines by calibrating the yaw misalignment without the nacelle-based lidar during the operation after a relationship model between variables that affect the flow deflection angle and a relative wind direction variable by installing a lidar on the nacelle or the ground before operation. [1] calibrates the yaw misalignment by estimating a relative wind direction corrected for the flow deflection angle using a machine learning model as a relational model, and [2] calibrates the yaw misalignment by estimating the flow deflection angle using a statistical analysis-based relation model and correcting the flow deflection angle for a measured relative wind direction.
These methods assume that the assembly angle and the flow deflection angle are stationary over time. However, an actual assembly angle varies over time. The relationship characteristics between the rotor rotation speed and free wind speed which affect the flow deflection angle and the flow deflection angle, also change over time by the wake effect generated by other wind turbines in a wind farm. So, assembly angle and the flow deflection angle characteristics have non-stationarity that varies over time.
For this reason, conventional methods such as [1] and [2] can precisely calibrate the yaw misalignment using a machine learning-based yaw misalignment calibration model and a statistics-based yaw misalignment calibration model without utilizing a lidar. Still, there arises a problem that the assembly angle must be re-corrected each time due to the non-stationarity of the assembly angle and the flow deflection angle. A flow deflection angle correction relationship model must be re-developed and applied.
In addition, since the methods [1] and [2] are yaw misalignment calibration models based on the stationarity of the flow deflection angle, the yaw misalignment problem caused by the wake effect, which is a leading cause that affects the non-stationarity of the flow deflection angle, could not be solved.
Among components of the yaw misalignment of a horizontal axis wind turbine, the assembly angle and the flow deflection angle have the non-stationarity. When developing a yaw misalignment calibrating model without the premise of such the non-stationarity, there arises a problem of periodically developing and applying a new yaw misalignment calibrating model of wind turbines in a wind farm. There arises a problem that the wake effect that affects the non-stationarity of the flow deflection angle could not be corrected effectively. To solve this problem, a yaw misalignment calibrating model should be developed under the premise that the assembly angle and the flow deflection angle have the non-stationarity.
A wind vane's measured relative wind direction values reflect the assembly angle and the flow deflection angle. Assuming that measured relative wind direction values follow the hidden Markov model, the Kalman filter can be applied to a series of measured relative wind direction values to calculate the assembly angle and the flow deflection angle having the non-stationarity. The main factors influencing the flow deflection angle are the free wind speed and rotor rotation speed when not affected by the wake effect. In this hidden Markov model, hereafter, measured relative wind direction values are referred to as observed relative wind direction values, and hereafter, relative wind direction values really measured in a wind vane are referred to as really measured relative wind direction values.
Here, via training a recurrent neural network sequence model where a current flow deflection angle calculated from a series of really measured relative wind direction values through the Kalman filter is used as a target feature, and a previous flow deflection angle, a current free wind speed, and a current rotor rotation speed are used as input features, it is possible to obtain a nonlinear relationship model between the free wind speed and rotor rotation speed and the flow deflection angle.
To train a Kalman filter and recurrent neural network based sequence flow deflection angle prediction model, when wind turbines in a wind farm are operated normally for a certain term, the average values of really measured relative wind direction values, free wind speeds, output power values, and rotor speeds and measurement times for a certain period are received from wind turbines and stored. Raw data is generated for training the sequence flow deflection angle prediction model.
The Kalman filter is applied to a series of really measured relative wind direction values for each wind turbine to obtain a series of the summed values of assembly angles and flow deflection angles, and the Kalman filter is applied to a series of really measured relative wind direction values to obtain a series of assembly angles. A series of the summed values of assembly angles and flow deflection angles is subtracted by a series of assembly angles to obtain a series of flow deflection angles for each wind turbine. Training data of a certain number of sequences is generated using previous flow deflection angles, current free wind speeds, and current rotor rotation speeds as input features, and current flow deflection angles as a target feature. Then, the recurrent neural network based sequence flow deflection angle prediction model is trained with the training data.
The recurrent neural network based sequence flow deflection angle prediction model is a relationship model under the premise that a sequence relationship between previous flow deflection angles, current free wind speeds, and current rotor rotation speeds and current flow deflection angles has stationarity. Therefore, it is impossible to predict flow deflection angles by reflecting the non-stationarity of the flow deflection angle using this model.
The recurrent neural network based sequence flow deflection angle prediction model is used as an actor in an actor-critic deep reinforcement learning model, and a recurrent neural network based sequence relationship model of free wind speeds, rotor rotation speeds, flow deflection angles, and differential action values is used as a critic. With an actor-critic flow deflection angle prediction deep reinforcement learning model using output power as a reward value, a relation model reflecting the non-stationarity of the sequence relationship of previous flow deflection angles, current free wind speeds, and current rotor rotation speeds and current flow deflection angles can be trained. Then it is possible to more accurately predict the flow deflection angle with the non-stationarity due to turbulence and the wake effect.
The weight data of a pre-trained recurrent neural network based sequence flow deflection angle prediction model is loaded as the actor's weight data of the actor-critic flow deflection angle prediction deep reinforcement learning model, so the actor-critic flow deflection angle prediction deep reinforcement learning model learns by itself and predicts the flow deflection angle with the non-stationarity while reliably predicting the flow deflection angle without initial reinforcement learning.
Yaw misalignment values are estimated by adding flow deflection angles predicted by the actor-critical flow deflection angle prediction deep reinforcement learning model with really measured relative wind direction values and assembly angles obtained by applying the Kalman filter to really measured relative wind direction values. The yaw misalignment is calibrated using estimated yaw misalignment values.
When calibrating the yaw misalignment of a wind turbine, assembly angles must be corrected at regular intervals, and a yaw misalignment estimation relationship model including the flow deflection angle must be re-analyzed or trained because of the non-stationarity of the assembly angle and the flow deflection angle of the yaw misalignment. In particular, periodic correction for the non-stationarity of the assembly angle and the flow deflection angle of many wind turbines in a wind farm incurs enormous costs. However, during the normal operation of a wind turbine, nonstationary assembly angles are calculated by applying the Kalman filter to a series of really measured relative wind direction values, and non-stationary flow deflection angles are calculated using the actor-critical flow deflection angle prediction deep reinforcement learning model. With the calculated assembly and flow deflection angles, it is possible to estimate and calibrate the yaw misalignment in real-time. That is, the calibration of the yaw misalignment of the non-stationarity is fully automated to maximize the reduction of the operating cost of the manual calibration for the nonstationary yaw misalignment.
The actor-critic flow deflection angle prediction deep reinforcement learning model, which applies the recurrent neural network based sequence flow deflection angle prediction model as the actor, automatically learns the flow deflection angle prediction model by itself using output power as a reward value during the normal operation of a wind turbine and then, predicts flow deflection angles. Through the yaw misalignment calibration that reflects the flow deflection angle's non-stationarity affected by turbulence and the wake effect that change in real-time, it is possible not only to calibrate the yaw misalignment more accurately than a calibration method that does not reflect the non-stationarity of the yaw misalignment but also to correct the wake effect being a significant factor reducing power generation for wind farm power generation and then, maximize the output power of a wind turbine itself and the power production of a wind farm.
The present invention will be described using the accompanying drawings through the following contents by way of specific examples.
The present invention relates to a yaw misalignment calibration method of a horizontal axis wind turbine. As shown in [
However, shown in
αo=g(−φ−δ+μr) (1)
where μo are observed relative wind direction values in the hidden Markov model, g is a transformation function of a sensor. To calibrate the yaw misalignment from a distorted, observed relative wind direction values (μo), assembly angles (φ) and flow deflection angles (δ) are calculated, and then, as Equation 2, the yaw misalignment (γ) of the real relative wind direction should be estimated
γ=μr=φ+δ+g−1(μo) (2)
As shown in
The yaw misalignment method through the Kalman filter and actor-critic based flow deflection angle prediction deep reinforcement learning, by itself, learns a yaw misalignment method to maximize power production in response to changes in the characteristics of factors that affect wind power during the operation of a wind turbine. It is an intelligent and automated method to calibrate the yaw misalignment. The system in which this method is implemented is a self-learning yaw misalignment control intelligent entity (100) equipped with intelligent software on a computer, as shown in [
In step S3, after the self-learning yaw misalignment control intelligent entity (100) is installed in a wind turbine, before the first wind turbine operation, according to an operator's choice, the actor-critic flow deflection angle prediction deep reinforcement learning module (19) is initialized by loading the pre-trained recurrent neural network based sequence flow deflection angle prediction model weight data (20) as actor weight data and then, can be initialized by loading the pre-trained recurrent neural network based sequence flow deflection angle prediction model weight data (20) or by loading actor-critic model weight data trained while operating a wind turbine as the actor weight data, before a wind turbine is operated.
After starting the operation of a wind turbine in step S4, in the self-learning yaw misalignment control intelligent entity (100), the really measured relative wind direction, free wind speed, output power, and rotor rotation speed information is received from the wind vane, anemometer, output power sensor, and rotor rotation speed sensor (16) as an input signal, and the Kalman filter module (17) calculates assembly angles through the Kalman filter from a series of really measured relative wind direction values and averages really measured relative wind direction values, free wind speeds, rotor rotation speeds, and output power values over a certain period. The averaged values of the free wind speed, rotor rotation speed, and output power are sent and stored to the experience replay buffer (18). Really measured relative wind direction average values and assembly angle are sent to the yaw misalignment calculation and calibration module (22). The actor-critic flow deflection angle prediction deep reinforcement learning module (19) uses a previous flow deflection angle, a current free wind speed, a rotor rotation speed, a flow deflection angle, and a next output power value obtained from the experience replay buffer (18) as a unit experience feature. A series of experience feature sequences are randomly sampled to generate training data, and actor and critic models are trained to store the trained weight data (21) of the actor and critic models. After training, current flow deflection angles are predicted by using the sequence data of previous flow deflection angles, current free wind speeds, and rotor rotation speeds for a certain period. Current flow deflection angle values are sent and stored to the experience replay buffer (18) and sent to the yaw misalignment calculation and calibration module (22).
In step S5, a yaw misalignment calibration value is calculated using really measured relative wind direction average values, assembly angles, and current flow deflection angles received by the yaw misalignment calibration module (22), and the yaw misalignment calibration information is used as an output signal to calibrate the yaw misalignment in real-time.
Next, the S1 and S2 steps to train the Kalman filter and recurrent neural network based flow deflection angle prediction model and the S3, S4, S5 steps to learn the yaw misalignment calibration model by itself according to the situation of a wind turbine in real-time through the self-learning yaw misalignment control intelligent entity (100) will be described in more detail.
In step S1, a series of measurement time values and average values of really measured relative wind direction values (μzraw), free wind speeds (U∞raw), output power values (Praw), and rotor rotation speeds (Ωraw) for a certain period (Traw) within a certain term is obtained and stored during the normal operation by each wind turbine of a wind farm. Hereafter, the average value and the variance value over Traw are called a raw average value and a variance value. Here, Traw is greater than 0, an integer multiple of a unit time, Tunit and satisfies the following equation.
T
raw
=R
raw
T
unit
,R
raw
ϵN
where N is an integer set. In step S2, to create and train the recurrent neural network based sequence flow deflection angle prediction model, a series of raw average values (μzraw) of really measured relative wind direction values is used to calculate really measured relative wind direction values (μz1st) averaged over Tφ+δ necessary to calculate the summed values of assembly angles and flow deflection angles, and to calculate really measured relative wind direction values (μz2nd) averaged over Tφ necessary to calculate assembly angles. Hereafter, the average value and the variance value over Tφ+δ are called the first average value and the first variance value. The average value and the variance value over Tφ are called the second average value and the second variance value. A series of flow deflection angles is calculated by applying a series of the first and second average values of really measured relative wind direction values to Kalman filtering. Here, Tφ+δTφ are greater than 0, an integer multiple of Traw and satisfy the following equation.
where Rφ+δ is greater than 0, an integer multiple for Traw to be Tφ+δ, Rφ is greater than 0, an integer multiple for Traw to be Tφ, and R is greater than 0, an integer multiple as Tφ over Tφ+δ.
As shown in
μr
μr
where μr
are the variance values of μr
Current observed relative wind direction values (μo
μo
Current assembly angles (φk) are defined as Equation (6) under the premise that the axis direction of a wind vane changes rapidly at random due to other causes or gradually changes (aging) over time, and current assembly angles satisfy the hidden Markov model.
φk=fkφ(φk−1)+φke+εφ
where fkφ(φk−1) is a current internal transformation function for the assembly angle and is approximated as a function for a monotonic function interval to change over time, and φke is an externally affected, current random variable of the assembly angle, and εφ
Current flow deflection angles (δk) also have non-stationarity due to the wake effect of other wind turbines in a wind farm and the turbulence (9) due to other causes and is defined as in Equation (7) on the premise that current flow deflection angles satisfy the hidden Markov model.
δk=fkδ(Ωk−1)+βδke(U∞
where fkδ(δk−1) is a current internal transformation function for the flow deflection angle, and p is an externally affected coefficient of the flow deflection angle, and δk0(U∞
So, current distorted relative wind direction values (μm
μm
μo
εg
To estimate real relative wind direction values more accurately using the Kalman filter under the premise that observed relative wind direction values satisfy the hidden Markov model as above, the internal transformation function, the external factor function, and the internal noise of Equations (5), (6), and (7) and the sensor conversion function and sensor noise of Equation (9) must be defined. However, it is very difficult to define each of these internal conversion functions, external factor functions and internal noise, sensor conversion functions, and sensor noise.
However, each internal transformation function is approximated as 1 of a continuous function, and external factor functions to satisfy each transformation of Equations (5), (6), (7) are approximated using a series of really measured relative wind direction values (μz) as followings. Then a Kalman filter model can be developed. A current external factor function (ξk) is the sum of a current external factor assembly angle function (φke) and a current external factor flow deflection angle function (δke) as Equation (10). Such the current external factor function (ξk) is approximated as the multiplication of the difference value of near average values (
ξk=φz
where Tnear is greater than 0, an integer multiple of Tφ+δ or Tφ and is defined by the following equation.
T
near
=R
near
T
φ+δ or Tnear=RnearTφ,RnearϵN
Since each internal noise is very small, they are approximated as values close to 0. A sensor transformation function is defined as a linear function with coefficients (α1, α2) as Equation (11) obtained as optimal values via Kalman filter tuning. Since the internal noise is close to 0, the average values of the near variance values of really measured relative wind direction values (μz) are used as the approximate values of the sensor noise.
h(μm
In step S11, a buffer for raw average values of really measured relative wind direction values is initialized. In step S12, current raw average values (μz
μm
p
k|k−1
=p
k−1
+q (13)
where μm
In step S17, the first or second average and variance values of current, observed relative wind direction values are calculated as Equations (14) and (15).
μo
s
k=α12pk|k−1+rk (15)
rk is a variance value of current sensor noise and is approximated by the first or second variance value of current, really measured relative wind direction values.
In step S18, a current Kalman gain is calculated as Equation (16) using Equations (13) and (15).
k
k
=p
k|k−1α1sk−1 (16)
In step 519, to calculate and store the first or second value of current, distorted relative wind direction values as Equation (17), the first or second average value (μm
μm
p
k=(1−kkα1)pk|k−1 (18)
In step S20, the Kalman filtering is stopped or continued depending on whether or not the Kalman filtering continues.
Thus, a series of the sum values of assembly angles and flow deflection angles is calculated through the Kalman filter through steps S11 to S20 using the first average value of really measured relative wind direction values over Tφ+δ during which real relative wind directions are approximated as 0.
Also, using the second average value of really measured relative wind direction values over Tφ during which all average values of real relative wind directions and flow deflection angles are approximated as 0, only assembly angles are calculated through the Kalman filter through steps S11 to S20.
The procedure to calculate flow deflection angles via the Kalman filter is summarized as follows. As shown in
As shown in
Flow deflection angles are estimated using the Kalman filter, which is a rule-based method, and a series of flow deflection angles estimated by this method are used as the target and input features of the recurrent neural network based sequence flow deflection angle prediction model that is a deep learning based method, and a more accurate flow deflection angle estimation nonlinear relationship model than the rule-based method is obtained.
As shown in [
{circumflex over (δ)}i,k−N
where {circumflex over (δ)}i,k−N
To train the parameters of each function of Equation (19) using training data, a loss function is defined as Equation (20).
where NbatchRNN is the number of a batch, and δi,j is the jth ground truth of the sequence of the target feature.
By optimizing the loss function of Equation (20) by gradient descent, optimized weight data of the recurrent neural network based sequence flow deflection angle prediction model is obtained.
The optimized weight data is stored to be used as the actor weight data of the actor-critic based flow deflection angle prediction deep reinforcement learning model.
In step S3, after the self-learning yaw misalignment control intelligent entity (100) is installed in a wind turbine, the actor-critic based flow deflection angle prediction reinforcement learning module (19) is initialized by loading the pre-trained weight (20, Wrnn) of the recurrent neural network based sequence flow deflection prediction model as the actor weight data (Wactor), before the first wind turbine operation and then, is initialized through loading the pre-trained Wrnn into the actor or loading the actor weight data (Wactor) and critic weight data (Wcritic) trained during the operation according to operators' choice, before a wind turbine's operation.
In step S4, after starting a wind turbine's operation, in the self-learning yaw misalignment control intelligent entity, the average values of current, really measured relative wind direction values (μz
The second average values of current free wind speeds (U∞
The experience replay buffer (18) is a circular buffer to save Nexp unit experience features where a unit experience feature is composed of a free wind speed (U∞), a rotor rotation speed (Ω) a flow deflection angle (δ), and an output power value (P).
To predict current flow deflection angels (δk), in the initialized actor-critic based flow deflection angle prediction reinforcement learning module (19), a sequence of Nseq input unit features from previous input unit features to a current input unit feature as Equation (22) is sampled from the experience replay buffer (18) where the current input unit feature is defined as a current flow deflection angle, the second average values of current free wind speeds and rotor rotation speeds as Equation (21). Then, as Equation (23), via the actor (π), current flow deflection angles (δk) are predicted, and sent, and stored to the experience replay buffer (18) and sent to the yaw misalignment calculation and calibration module (22).
The actor-critic based flow deflection angle prediction reinforcement learning module (19) obtains batch-sequence (NbatchRL×Nseq) training data with the period of Tlearn from the experience replay buffer (18) via sampling current reinforcement learning unit feature NbatchRL sequences as Equation (25) based on the reinforcement learning unit feature sequence of Equation (24).
As shown in
and a current action (ak) as a factor affecting the output feature of the recurrent neural network, via using the state sequence
from previous states to a current state and the current action (ak) as the input. The current state (sk=<U∞
Q(ak,sk−N
where h0 is the 0th hidden internal feature vector of the sequence, and c0 is the 0th cell internal feature vector of the sequence. When training the actors and critic models of the actor-critic based flow deflection angle prediction reinforcement learning module (19) using batch-sequence training data, first, batch-sequence training data is randomly shuffled to obtain independence between the sequence data in the batch, and then randomly shuffled batch-sequence training data is created.
When training, to increase sample efficiency, the mini-batch number (Nmini-batchRL) is used as a batch unit in the shuffled batch-sequence training data, and step-based iterative training is carried out with the integer step number (RstepRL=NbatchRL/Nmini-batchRL, RstepRLϵN) greater than 0, and such the step-based iterative within epoch-based iterative learning is carried out with the epoch number (NepochRL). In the epoch-based iterative training, training is carried out by obtaining different shuffled batch-sequence training data for each epoch as the batch-sequence training data.
When the step-based iterative training of an actor-critical model within the epoch-based iterative learning is carried out using shuffled batch-sequence training data, shuffled mini-batch-sequences are sampled to train critic and actor models iteratively.
The actor-critic based flow deflection angle prediction deep reinforcement learning model is a model to predict flow deflection angles for optimized yaw control adapting to a wind turbine's environment in real-time, receiving free wind speed and rotor rotation speed states relevant to a wind turbine's environment, and output power as a reward value, in real-time. The current return of this model is the differential return defined as Equation (27).
G
k
=R
k+1
−
k+2
−
k+3
−
and the current action (ak) is the action-value function (Q) of Equation (26).
Here, with a temporal difference training method, the action-value function (Q) of the critic model is trained by the gradient descent method via defining the critic model's loss function as the expected value of a squared advantage as Equation (31) where the advantage is defined as Equation (30) using the shuffled mini-batch training data for critic model training as Equation (29).
i is the index of the shuffled mini-batch, and the advantage of Equation (30) is the estimated error of the reward value, and the average reward value (
η is an average reward value update coefficient, which is a real number greater than 0.
Also, using the shuffled mini-batch-sequence training data to train the actor model as Equation (33), the actor model is trained with the proximal policy optimization method by the gradient ascent method via defining a loss function where for the multiplication (ri(Wactor) Ai) of a probability ratio and the advantage (Ai) as Equation (35), the actor model's loss function becomes a certain value using the probability ratio of a current actor model (π) over a previous actor model (πold) as Equation (34) in the case that the advantage (Ai) is greater than 0, and the probability ratio (ri(Wactor)) is equal to and more than 1+ε, or the advantage (Ai) is less than 0, and the probability ratio (ri(Wactor)) is equal to and less than 1−ε. ε is a value greater than 0 and less than 1.
The actor and critic models' weight data of the actor-critic flow deflection angle prediction deep reinforcement learning model are stored as actor-critic model weight data (21).
Finally, the yaw misalignment calculation and calibration module (22) calculates current yaw misalignment values (γk) as Equation (36) using current assembly angles (φk) and the second average values (μz
γk=μz
Number | Date | Country | Kind |
---|---|---|---|
10-2020-0186882 | Dec 2020 | KR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/KR2021/020228 | 12/29/2021 | WO |