KALMAN FILTER AND DEEP REINFORCEMENT LEARNING BASED WIND TURBINE YAW MISALIGNMENT CONTROL METHOD

Information

  • Patent Application
  • 20240052804
  • Publication Number
    20240052804
  • Date Filed
    December 29, 2021
    2 years ago
  • Date Published
    February 15, 2024
    4 months ago
  • Inventors
    • Chung; Inwoo
Abstract
A Kalman filter and deep reinforcement learning based yaw misalignment control method of wind turbines is disclosed. During the normal operation of a wind turbine, the yaw misalignment control method of the present invention calculates non-stationary assembly angles by applying a Kalman filter to a series of really measured relative wind direction values and predicts non-stationary flow deflection angles through an actor-critic flow deflection angle prediction deep reinforcement learning model and then, by estimating and calibrating the yaw misalignment, the calibration of the yaw misalignment with the non-stationarity is completely automated, maximizing the reduction in operating costs for manual calibration of the non-stationarity of the yaw misalignment.
Description
TECHNICAL FIELD

The present invention relates to a yaw misalignment control method for maximizing wind turbine power production through a yaw misalignment calibration value prediction model based on the Kalman filter and deep reinforcement learning. Specifically, the present invention relates to a yaw misalignment control method to predict flow deflection angles caused by wake using free stream wind speeds and rotor rotation speeds received from a wind turbine through a Kalman filter and recurrent neural network based sequence flow deflection angle prediction model and to estimate and calibrate yaw misalignment with relative wind direction values obtained from wind turbine operation data, assembly angles calculated via the Kalman filter, and flow defection angles predicted by an actor-critic flow deflection angle prediction deep reinforcement learning model which is a self-learning model to maximize the power production of a wind turbine, using produced power as a reward value during the normal operation of a wind turbine.


BACKGROUND ART

In the case of a horizontal axis wind turbine generating output power by rotating the rotor to a direction facing the wind direction as shown in [FIG. 1], the nacelle must be rotated to the direction facing the wind direction to maximize power production. To rotate the nacelle in the direction facing the wind direction, these wind turbines have a yawing system that rotates the nacelle, a meteorological mast composed of a wind vane and an anemometer installed on the nacelle, and a yaw controller that controls the yawing system to calculate and calibrate yaw misalignment.


As shown in [FIG. 2], the yaw misalignment can be defined with the relative wind direction for the nacelle direction, the assembly angle of a wind vane or lidar, and the flow deflection angle caused by wake behind the rotor. Here, the relative wind direction can be measured directly from a wind vane, but a wind vane's assembly angle must be measured and corrected before a wind turbine is operated. The flow deflection angle must be calculated and corrected during the operation. To accurately calculate and correct a wind vane's assembly angle and the flow deflection angle in an existing method, a lidar that precisely measures a free wind direction and a free stream wind speed is installed on the ground. The assembly angle is measured and corrected before operation. A lidar is always installed on the nacelle to measure the relative wind direction without the flow deflection angle to calibrate the yaw misalignment during the operation.


[1] and [2] are relevant to a method to reduce the cost of installing and operating a nacelle-based lidar in all wind turbines by calibrating the yaw misalignment without the nacelle-based lidar during the operation after a relationship model between variables that affect the flow deflection angle and a relative wind direction variable by installing a lidar on the nacelle or the ground before operation. [1] calibrates the yaw misalignment by estimating a relative wind direction corrected for the flow deflection angle using a machine learning model as a relational model, and [2] calibrates the yaw misalignment by estimating the flow deflection angle using a statistical analysis-based relation model and correcting the flow deflection angle for a measured relative wind direction.


These methods assume that the assembly angle and the flow deflection angle are stationary over time. However, an actual assembly angle varies over time. The relationship characteristics between the rotor rotation speed and free wind speed which affect the flow deflection angle and the flow deflection angle, also change over time by the wake effect generated by other wind turbines in a wind farm. So, assembly angle and the flow deflection angle characteristics have non-stationarity that varies over time.


For this reason, conventional methods such as [1] and [2] can precisely calibrate the yaw misalignment using a machine learning-based yaw misalignment calibration model and a statistics-based yaw misalignment calibration model without utilizing a lidar. Still, there arises a problem that the assembly angle must be re-corrected each time due to the non-stationarity of the assembly angle and the flow deflection angle. A flow deflection angle correction relationship model must be re-developed and applied.


In addition, since the methods [1] and [2] are yaw misalignment calibration models based on the stationarity of the flow deflection angle, the yaw misalignment problem caused by the wake effect, which is a leading cause that affects the non-stationarity of the flow deflection angle, could not be solved.


Technical Problem

Among components of the yaw misalignment of a horizontal axis wind turbine, the assembly angle and the flow deflection angle have the non-stationarity. When developing a yaw misalignment calibrating model without the premise of such the non-stationarity, there arises a problem of periodically developing and applying a new yaw misalignment calibrating model of wind turbines in a wind farm. There arises a problem that the wake effect that affects the non-stationarity of the flow deflection angle could not be corrected effectively. To solve this problem, a yaw misalignment calibrating model should be developed under the premise that the assembly angle and the flow deflection angle have the non-stationarity.


Solution to Problem

A wind vane's measured relative wind direction values reflect the assembly angle and the flow deflection angle. Assuming that measured relative wind direction values follow the hidden Markov model, the Kalman filter can be applied to a series of measured relative wind direction values to calculate the assembly angle and the flow deflection angle having the non-stationarity. The main factors influencing the flow deflection angle are the free wind speed and rotor rotation speed when not affected by the wake effect. In this hidden Markov model, hereafter, measured relative wind direction values are referred to as observed relative wind direction values, and hereafter, relative wind direction values really measured in a wind vane are referred to as really measured relative wind direction values.


Here, via training a recurrent neural network sequence model where a current flow deflection angle calculated from a series of really measured relative wind direction values through the Kalman filter is used as a target feature, and a previous flow deflection angle, a current free wind speed, and a current rotor rotation speed are used as input features, it is possible to obtain a nonlinear relationship model between the free wind speed and rotor rotation speed and the flow deflection angle.


To train a Kalman filter and recurrent neural network based sequence flow deflection angle prediction model, when wind turbines in a wind farm are operated normally for a certain term, the average values of really measured relative wind direction values, free wind speeds, output power values, and rotor speeds and measurement times for a certain period are received from wind turbines and stored. Raw data is generated for training the sequence flow deflection angle prediction model.


The Kalman filter is applied to a series of really measured relative wind direction values for each wind turbine to obtain a series of the summed values of assembly angles and flow deflection angles, and the Kalman filter is applied to a series of really measured relative wind direction values to obtain a series of assembly angles. A series of the summed values of assembly angles and flow deflection angles is subtracted by a series of assembly angles to obtain a series of flow deflection angles for each wind turbine. Training data of a certain number of sequences is generated using previous flow deflection angles, current free wind speeds, and current rotor rotation speeds as input features, and current flow deflection angles as a target feature. Then, the recurrent neural network based sequence flow deflection angle prediction model is trained with the training data.


The recurrent neural network based sequence flow deflection angle prediction model is a relationship model under the premise that a sequence relationship between previous flow deflection angles, current free wind speeds, and current rotor rotation speeds and current flow deflection angles has stationarity. Therefore, it is impossible to predict flow deflection angles by reflecting the non-stationarity of the flow deflection angle using this model.


The recurrent neural network based sequence flow deflection angle prediction model is used as an actor in an actor-critic deep reinforcement learning model, and a recurrent neural network based sequence relationship model of free wind speeds, rotor rotation speeds, flow deflection angles, and differential action values is used as a critic. With an actor-critic flow deflection angle prediction deep reinforcement learning model using output power as a reward value, a relation model reflecting the non-stationarity of the sequence relationship of previous flow deflection angles, current free wind speeds, and current rotor rotation speeds and current flow deflection angles can be trained. Then it is possible to more accurately predict the flow deflection angle with the non-stationarity due to turbulence and the wake effect.


The weight data of a pre-trained recurrent neural network based sequence flow deflection angle prediction model is loaded as the actor's weight data of the actor-critic flow deflection angle prediction deep reinforcement learning model, so the actor-critic flow deflection angle prediction deep reinforcement learning model learns by itself and predicts the flow deflection angle with the non-stationarity while reliably predicting the flow deflection angle without initial reinforcement learning.


Yaw misalignment values are estimated by adding flow deflection angles predicted by the actor-critical flow deflection angle prediction deep reinforcement learning model with really measured relative wind direction values and assembly angles obtained by applying the Kalman filter to really measured relative wind direction values. The yaw misalignment is calibrated using estimated yaw misalignment values.


Advantageous Effects of Invention

When calibrating the yaw misalignment of a wind turbine, assembly angles must be corrected at regular intervals, and a yaw misalignment estimation relationship model including the flow deflection angle must be re-analyzed or trained because of the non-stationarity of the assembly angle and the flow deflection angle of the yaw misalignment. In particular, periodic correction for the non-stationarity of the assembly angle and the flow deflection angle of many wind turbines in a wind farm incurs enormous costs. However, during the normal operation of a wind turbine, nonstationary assembly angles are calculated by applying the Kalman filter to a series of really measured relative wind direction values, and non-stationary flow deflection angles are calculated using the actor-critical flow deflection angle prediction deep reinforcement learning model. With the calculated assembly and flow deflection angles, it is possible to estimate and calibrate the yaw misalignment in real-time. That is, the calibration of the yaw misalignment of the non-stationarity is fully automated to maximize the reduction of the operating cost of the manual calibration for the nonstationary yaw misalignment.


The actor-critic flow deflection angle prediction deep reinforcement learning model, which applies the recurrent neural network based sequence flow deflection angle prediction model as the actor, automatically learns the flow deflection angle prediction model by itself using output power as a reward value during the normal operation of a wind turbine and then, predicts flow deflection angles. Through the yaw misalignment calibration that reflects the flow deflection angle's non-stationarity affected by turbulence and the wake effect that change in real-time, it is possible not only to calibrate the yaw misalignment more accurately than a calibration method that does not reflect the non-stationarity of the yaw misalignment but also to correct the wake effect being a significant factor reducing power generation for wind farm power generation and then, maximize the output power of a wind turbine itself and the power production of a wind farm.





BRIEF DESCRIPTION OF DRAWINGS


FIG. 1 is a block diagram of components of a yaw control system of a horizontal axis wind turbine.



FIG. 2 is a schematic diagram of a yaw misalignment component.



FIG. 3 is a flowchart illustrating a method for controlling a yaw misalignment based on a Kalman filter and deep reinforcement learning according to an embodiment of the present invention.



FIG. 4 is a schematic diagram of a hidden Markov model of an observed relative wind direction.



FIG. 5 is a flowchart to calculate summed values of assembly angles and flow deflection angles or assembly angles from a series of really measured relative wind direction values using a Kalman filter according to an embodiment of the present invention.



FIG. 6 is a flowchart for calculating flow deflection angles using a Kalman filter according to an embodiment of the present invention.



FIG. 7 is a schematic diagram of a recurrent neural network based sequence flow deflection angle prediction model according to an embodiment of the present invention.



FIG. 8 is a block diagram of a self-learning yaw misalignment control intelligent entity according to an embodiment of the present invention.



FIG. 9 is a schematic diagram of an actor model according to an embodiment of the present invention.





DESCRIPTION OF EMBODIMENTS

The present invention will be described using the accompanying drawings through the following contents by way of specific examples.


The present invention relates to a yaw misalignment calibration method of a horizontal axis wind turbine. As shown in [FIG. 1], the horizontal axis wind turbine is mainly composed of the blade (1), rotor (2), and nacelle (3), and since the horizontal axis wind turbine's output power efficiency decreases due to the misalignment of the nacelle direction (8) for the wind direction (7), the output power efficiency is increased by rotating the nacelle to align the nacelle direction with the wind direction as yaw controlling. Via yaw controlling, a relative wind direction is measured in the meteorological tower (6) installed in the nacelle. The yaw misalignment is estimated using relative wind direction values really measured in the yaw controller (5). The yawing system (4) is controlled to calibrate the yaw misalignment.


However, shown in FIG. 3, the relative wind direction really measured in a wind vane for the real relative wind direction (μr) is distorted not only by the flow deflection angle (δ) incurred by the wind's blades passing and the assembly angle (φ) of a wind vane, but also in a measurement process by a sensor.





αo=g(−φ−δ+μr)  (1)


where μo are observed relative wind direction values in the hidden Markov model, g is a transformation function of a sensor. To calibrate the yaw misalignment from a distorted, observed relative wind direction values (μo), assembly angles (φ) and flow deflection angles (δ) are calculated, and then, as Equation 2, the yaw misalignment (γ) of the real relative wind direction should be estimated





γ=μr=φ+δ+g−1o)  (2)


As shown in FIG. 3, to estimate and calibrate the yaw misalignment (γ) of the real relative wind direction (μr), in step S1, a series of really measured relative wind direction values, free wind speeds, output power values, and rotor rotation speeds during the normal operation for each wind turbine in a wind farm are received and saved as operation data. In step S2, using the operation data as the training data of the Kalman filter and recurrent neural network based flow deflection angle prediction model, the recurrent neural network based sequence flow deflection angle prediction model is created and trained where current flow deflection angles calculated through the Kalman filter is used as a target feature, and previous flow deflection angles, free wind speeds, and rotor rotation speeds are used as input features. The weight of the recurrent neural network based sequence flow deflection angle prediction model is used as the pre-trained weight of the actor model of the actor-critic based flow deflection angle prediction deep reinforcement learning model. So, we store the weight data of a trained recurrent neural network based sequence flow deflection angle prediction model.


The yaw misalignment method through the Kalman filter and actor-critic based flow deflection angle prediction deep reinforcement learning, by itself, learns a yaw misalignment method to maximize power production in response to changes in the characteristics of factors that affect wind power during the operation of a wind turbine. It is an intelligent and automated method to calibrate the yaw misalignment. The system in which this method is implemented is a self-learning yaw misalignment control intelligent entity (100) equipped with intelligent software on a computer, as shown in [FIG. 8], and is installed to interlock with a control system in a wind turbine.


In step S3, after the self-learning yaw misalignment control intelligent entity (100) is installed in a wind turbine, before the first wind turbine operation, according to an operator's choice, the actor-critic flow deflection angle prediction deep reinforcement learning module (19) is initialized by loading the pre-trained recurrent neural network based sequence flow deflection angle prediction model weight data (20) as actor weight data and then, can be initialized by loading the pre-trained recurrent neural network based sequence flow deflection angle prediction model weight data (20) or by loading actor-critic model weight data trained while operating a wind turbine as the actor weight data, before a wind turbine is operated.


After starting the operation of a wind turbine in step S4, in the self-learning yaw misalignment control intelligent entity (100), the really measured relative wind direction, free wind speed, output power, and rotor rotation speed information is received from the wind vane, anemometer, output power sensor, and rotor rotation speed sensor (16) as an input signal, and the Kalman filter module (17) calculates assembly angles through the Kalman filter from a series of really measured relative wind direction values and averages really measured relative wind direction values, free wind speeds, rotor rotation speeds, and output power values over a certain period. The averaged values of the free wind speed, rotor rotation speed, and output power are sent and stored to the experience replay buffer (18). Really measured relative wind direction average values and assembly angle are sent to the yaw misalignment calculation and calibration module (22). The actor-critic flow deflection angle prediction deep reinforcement learning module (19) uses a previous flow deflection angle, a current free wind speed, a rotor rotation speed, a flow deflection angle, and a next output power value obtained from the experience replay buffer (18) as a unit experience feature. A series of experience feature sequences are randomly sampled to generate training data, and actor and critic models are trained to store the trained weight data (21) of the actor and critic models. After training, current flow deflection angles are predicted by using the sequence data of previous flow deflection angles, current free wind speeds, and rotor rotation speeds for a certain period. Current flow deflection angle values are sent and stored to the experience replay buffer (18) and sent to the yaw misalignment calculation and calibration module (22).


In step S5, a yaw misalignment calibration value is calculated using really measured relative wind direction average values, assembly angles, and current flow deflection angles received by the yaw misalignment calibration module (22), and the yaw misalignment calibration information is used as an output signal to calibrate the yaw misalignment in real-time.


Next, the S1 and S2 steps to train the Kalman filter and recurrent neural network based flow deflection angle prediction model and the S3, S4, S5 steps to learn the yaw misalignment calibration model by itself according to the situation of a wind turbine in real-time through the self-learning yaw misalignment control intelligent entity (100) will be described in more detail.


In step S1, a series of measurement time values and average values of really measured relative wind direction values (μzraw), free wind speeds (Uraw), output power values (Praw), and rotor rotation speeds (Ωraw) for a certain period (Traw) within a certain term is obtained and stored during the normal operation by each wind turbine of a wind farm. Hereafter, the average value and the variance value over Traw are called a raw average value and a variance value. Here, Traw is greater than 0, an integer multiple of a unit time, Tunit and satisfies the following equation.






T
raw
=R
raw
T
unit
,R
raw
ϵN


where N is an integer set. In step S2, to create and train the recurrent neural network based sequence flow deflection angle prediction model, a series of raw average values (μzraw) of really measured relative wind direction values is used to calculate really measured relative wind direction values (μz1st) averaged over Tφ+δ necessary to calculate the summed values of assembly angles and flow deflection angles, and to calculate really measured relative wind direction values (μz2nd) averaged over Tφ necessary to calculate assembly angles. Hereafter, the average value and the variance value over Tφ+δ are called the first average value and the first variance value. The average value and the variance value over Tφ are called the second average value and the second variance value. A series of flow deflection angles is calculated by applying a series of the first and second average values of really measured relative wind direction values to Kalman filtering. Here, Tφ+δcustom-characterTφ are greater than 0, an integer multiple of Traw and satisfy the following equation.








T

φ
+
δ


=


R

φ
+
δ




T
raw



,


R

φ
+
δ



N









T
φ

=


R
φ



T
raw



,


R
φ


N









T
φ

/

T

φ
+
δ



=




R
φ



T
raw




R

φ
+
δ




T
raw



=



R
φ


R

φ
+
δ



=

R

N







where Rφ+δ is greater than 0, an integer multiple for Traw to be Tφ+δ, Rφ is greater than 0, an integer multiple for Traw to be Tφ, and R is greater than 0, an integer multiple as Tφ over Tφ+δ.


As shown in FIG. 4, a series of observed relative wind direction values (μo) can be defined with the hidden Markov model. For assembly angles and flow deflection angles to be approximately calculated using the Kalman filter based on this hidden Markov model, primarily, the summed values of current assembly angles and current flow deflection angles are calculated using the Kalman filter with the first average values of really measured relative wind direction values over Tφ+δ for the current real relative wind direction's average and variance values defined in the hidden Markov model to approximate 0 and a certain value respectively, as Equation (3). Current assembly angles are calculated using the Kalman filter with the second average values of really measured relative wind direction values over Tφ for the average and variance values of the summed values of current real relative wind directions and current flow deflection angles defined in the hidden Markov model to approximate 0 and a certain value respectively, as Equation (4). Flow deflection angles are approximately calculated by subtracting calculated assembly angles from calculated summed values of assembly angles and flow deflection angles.





μrk˜N(0,σrk2)  (3)





μrk−δk˜N(0,σμr−δk2  (4)


where μrk is a current real relative wind direction, μrk−δk is a difference value between the real relative win direction and the current flow direction angle, σμrk2,








σ


μ

r
e


-

δ
k


2





are the variance values of μrk, μrk−δk over custom-character Tφ+δ,Tφ respectively.


Current observed relative wind direction values (μok) are defined as Equation (5) on the premise that previous free wind speeds (Uk), previous rotor rotation speeds (Ωk−1), the previous wake effect (Wk−1)'s external effects, and previous real relative wind direction values (μrk) satisfy the hidden Markov model.





μok=fkμrrk−1,Uφk−1k−1,Wk−1)  (5)


Current assembly angles (φk) are defined as Equation (6) under the premise that the axis direction of a wind vane changes rapidly at random due to other causes or gradually changes (aging) over time, and current assembly angles satisfy the hidden Markov model.





φk=fkφk−1)+φkeφk  (6)


where fkφk−1) is a current internal transformation function for the assembly angle and is approximated as a function for a monotonic function interval to change over time, and φke is an externally affected, current random variable of the assembly angle, and εφk is a random variable with an average value of 0 and a certain variance value as the noise of the current assembly angle.


Current flow deflection angles (δk) also have non-stationarity due to the wake effect of other wind turbines in a wind farm and the turbulence (9) due to other causes and is defined as in Equation (7) on the premise that current flow deflection angles satisfy the hidden Markov model.





δk=fkδk−1)+βδke(Ukk,Wk)+εδk  (7)


where fkδk−1) is a current internal transformation function for the flow deflection angle, and p is an externally affected coefficient of the flow deflection angle, and δk0(Uk, Ωk, Wk) is a function for the current free wind speed (Uk), current rotor rotation speed (Ωk), current wake effect (Wk) as an externally affected, current flow deflection angle, and εδk is a random variable with an average value of 0 and a certain variance value as the noise of the current flow deflection angle.


So, current distorted relative wind direction values (μmk) before being measured by a sensor are defined as Equation (8), and relative wind directions after being measured by a sensor are defined as observed relative wind direction values, as in Equation (9).





μmk=−φk−δkrk  (8)





μok=gmk)+εgk  (9)


εgk is a random variable with an average value of 0 and a certain variance value as the noise generated from a current sensor measurement.


To estimate real relative wind direction values more accurately using the Kalman filter under the premise that observed relative wind direction values satisfy the hidden Markov model as above, the internal transformation function, the external factor function, and the internal noise of Equations (5), (6), and (7) and the sensor conversion function and sensor noise of Equation (9) must be defined. However, it is very difficult to define each of these internal conversion functions, external factor functions and internal noise, sensor conversion functions, and sensor noise.


However, each internal transformation function is approximated as 1 of a continuous function, and external factor functions to satisfy each transformation of Equations (5), (6), (7) are approximated using a series of really measured relative wind direction values (μz) as followings. Then a Kalman filter model can be developed. A current external factor function (ξk) is the sum of a current external factor assembly angle function (φke) and a current external factor flow deflection angle function (δke) as Equation (10). Such the current external factor function (ξk) is approximated as the multiplication of the difference value of near average values (μzk) over Tnear of really measured relative wind direction values (μz) and an external factor coefficient (σk) which is experimentally calculated as optimal values for the Kalman filter (Kφ+δ) to calculate the summed values of assembly angles and flow deflection angles and for the Kalman filter (Kφ) to calculate assembly angles, respectively and is applied to the calculation of the external factor function.





ξkzkezke=cξ×(μzkμzk−1)  (10)


where Tnear is greater than 0, an integer multiple of Tφ+δ or Tφ and is defined by the following equation.






T
near
=R
near
T
φ+δ or Tnear=RnearTφ,RnearϵN


Since each internal noise is very small, they are approximated as values close to 0. A sensor transformation function is defined as a linear function with coefficients (α1, α2) as Equation (11) obtained as optimal values via Kalman filter tuning. Since the internal noise is close to 0, the average values of the near variance values of really measured relative wind direction values (μz) are used as the approximate values of the sensor noise.






hmk)=α1×μmk2  (11)



FIG. 5 is a flow chart for a specific method to calculate the summed values of assembly angles and flow deflection angles or assembly angles from a series of really measured relative wind direction values (μz) via the Kalman filter.


In step S11, a buffer for raw average values of really measured relative wind direction values is initialized. In step S12, current raw average values (μziraw) of really measured relative wind direction values are acquired and saved to the buffer. In step S13, if the total integration time of the obtained raw average samples of really measured relative wind direction values satisfies Tφ+δ or Tφ, the S14 step is executed. Otherwise, the S12 step is executed. In step S14, the first or second average values of μzRφ+δ−t+1,Rα+δraw or μzRφ−t+1,Rφraw over Tφ+δ or Tφ, are calculated as current, really measured relative wind direction values (μzk), and the first or second variance values are calculated. In step S15, current near average values are calculated and stored using a series of the first or second really measured relative wind direction average values (μzRnear−k+1,k). The current external factor function (ξk) is calculated as a difference value between the current near average value (μzk) and the previous near average value (μzk−1) of really measured relative wind direction values, as Equation (10). In step 16, the first or second previous average values of distorted relative wind direction values and the current external factor function (ξk) are used to calculate the first or second average value of distorted relative wind direction values as Equation (12). The variance value of the sum of the internal noises of the assembly angle and the flow deflection angle is approximated as a value close to 0 to calculate the first or second variance value of intermediate, distorted relative wind direction values.





μmk|k−1mk−1k  (12)






p
k|k−1
=p
k−1
+q  (13)


where μmk−1 is the first or second average value of previous, distorted relative wind direction values, and μmk|k−1 is the first or second average value of intermediate, distorted relative wind direction values, and pk−1 is the first or second variance value of previous, distorted relative wind direction values, and pk|k−1 is the first or second variance value of intermediate, distorted relative wind direction values, and q is a variance value of the sum of internal noises of the assembly angle and the flow deflection angle and is obtained via Kalman filter tuning for the Kalman filter (Kφ+δ) to calculate the summed values of assembly angles and flow deflection angles and for the Kalman filter (Kφ) to calculate assembly angles, respectively. The initial average and variance values of the first or second distorted relative wind direction values use the first or second average and variance values of really measured relative wind direction values, respectively.


In step S17, the first or second average and variance values of current, observed relative wind direction values are calculated as Equations (14) and (15).





μok=hmk|k−1)=α1×μmk|k−12  (14)






s
k12pk|k−1+rk  (15)


rk is a variance value of current sensor noise and is approximated by the first or second variance value of current, really measured relative wind direction values.


In step S18, a current Kalman gain is calculated as Equation (16) using Equations (13) and (15).






k
k
=p
k|k−1α1sk−1  (16)


In step 519, to calculate and store the first or second value of current, distorted relative wind direction values as Equation (17), the first or second average value (μmk|k−1) of intermediate, distorted relative wind direction values, the first or second average value (ξzk) of current, really measured relative wind direction values, a current Kalman gain (kk), and the first or second average value (μok) of current, observed relative wind direction values are used. To calculate and store the first or second variance value of current, distorted relative wind directions as Equation (18), the current Kalman gain (kk) and the first or second variance value (pk|k−1) of intermediate, distorted relative wind direction values are used.





μmkmk|k−1+kkzk−μok)  (17)






p
k=(1−kkα1)pk|k−1  (18)


In step S20, the Kalman filtering is stopped or continued depending on whether or not the Kalman filtering continues.


Thus, a series of the sum values of assembly angles and flow deflection angles is calculated through the Kalman filter through steps S11 to S20 using the first average value of really measured relative wind direction values over Tφ+δ during which real relative wind directions are approximated as 0.


Also, using the second average value of really measured relative wind direction values over Tφ during which all average values of real relative wind directions and flow deflection angles are approximated as 0, only assembly angles are calculated through the Kalman filter through steps S11 to S20.


The procedure to calculate flow deflection angles via the Kalman filter is summarized as follows. As shown in FIG. 6, in step S21, a series of raw average values of really measured relative wind direction values is obtained. In step S22, a series of the first average values of really measured relative wind direction values is obtained. In step S23, summed values of assembly angles and flow deflection angles are calculated through the Kalman filter. In step S24, a series of the second average values of really measured relative wind direction values is obtained. In step 25, assembly angle values are only calculated via the Kalman filter. Finally, in step 26, flow deflection angles are calculated by subtracting assembly angles obtained in step S25 from the summed values of assembly angles and flow deflection angles obtained in step S26. In the case that raw average values (μztraw) of current, really measured relative wind direction values are used as an input factor of Kφ, current assembly angles (φk) and flow deflection angles (δk) with the period of Tφ are calculated where raw average values of really measured relative wind direction values are used as an input factor of Kφ+δ at R-1 intervals.


As shown in FIG. 7, the recurrent neural network based sequence flow deflection angle prediction model is created using current flow deflection angles (δk) as a target feature and using previous flow deflection angles (δk−1) and the second average values of current free wind speeds (Uk2nd) and current rotor rotation speeds (Ωk2nd) as input features. A series of flow deflection angle estimates obtained in step S26 and a series of the second average values of free wind speeds and rotor rotation speeds obtained from wind turbine operation data are used as training data to train the recurrent neural network based sequence flow deflection angle prediction model.


Flow deflection angles are estimated using the Kalman filter, which is a rule-based method, and a series of flow deflection angles estimated by this method are used as the target and input features of the recurrent neural network based sequence flow deflection angle prediction model that is a deep learning based method, and a more accurate flow deflection angle estimation nonlinear relationship model than the rule-based method is obtained.


As shown in [FIG. 7], to create the recurrent neural network based sequence flow deflection prediction model, The LSTM (Long Short-Term Memory) or LSTM variant is used as the recurrent unit layer of the recurrent neural network, and the layer length is determined for the prediction model to be optimized. Equation (19) is the recurrent neural network based sequence flow deflection angle prediction model's function using the LSTM or LSTM variant as the unit layer of the recurrent neural network.





{circumflex over (δ)}i,k−Nseq+1:k=F(xi,k−Nseq+1:k,hi,0,ci,0;Wrnn)  (19)


where {circumflex over (δ)}i,k−Nseq+1:k is a sequence of target feature vectors composed of flow deflection angles from k−Nseq+1 to k for the ith batch, and Nseq is the number of the sequence of the recurrent neural network based flow deflection prediction model, xi,k−Nseq+1:k is a sequence of input feature vectors composed of flow deflection angles from k−Nseq to k−1 and the second free wind speed and rotor rotation speed average values from k−Nseq+1 to k for the ith batch, and h is the hi,0 hidden internal state of the sequence for the ith batch, and ci,0 is the 0th cell internal state of the sequence for the ith batch, and Wrnn is the set of weight parameters for each function.


To train the parameters of each function of Equation (19) using training data, a loss function is defined as Equation (20).












L
RNN

=



1


Λ
batch
RNN

×

N
seq









i
=
1


N
batch
RNN






j
=

k
+

N
seq

+
1


k



(



δ
^


i
,
j


-

δ

i
,
j



)

2







(
20
)







where NbatchRNN is the number of a batch, and δi,j is the jth ground truth of the sequence of the target feature.


By optimizing the loss function of Equation (20) by gradient descent, optimized weight data of the recurrent neural network based sequence flow deflection angle prediction model is obtained.


The optimized weight data is stored to be used as the actor weight data of the actor-critic based flow deflection angle prediction deep reinforcement learning model.


In step S3, after the self-learning yaw misalignment control intelligent entity (100) is installed in a wind turbine, the actor-critic based flow deflection angle prediction reinforcement learning module (19) is initialized by loading the pre-trained weight (20, Wrnn) of the recurrent neural network based sequence flow deflection prediction model as the actor weight data (Wactor), before the first wind turbine operation and then, is initialized through loading the pre-trained Wrnn into the actor or loading the actor weight data (Wactor) and critic weight data (Wcritic) trained during the operation according to operators' choice, before a wind turbine's operation.


In step S4, after starting a wind turbine's operation, in the self-learning yaw misalignment control intelligent entity, the average values of current, really measured relative wind direction values (μztraw), current free wind speeds (Utraw), current output power values (Ptraw), and current rotor rotation speeds (ωtraw) which are averaged for Traw are received from the wind vane, anemometer, output power sensor, and rotor rotation speed sensor (16). Then, as the flowchart of FIG. 5, the Kalman filter module (17) calculates current assembly angles (φk) with the Tφ period by using raw average values (μztraw) of current, really measured relative wind direction values as the input of Kφ which is the Kalman filter to calculate assembly angles, and with such the Tφ period, calculates the second average values of current, really measured relative wind direction values (μzk2nd), current free wind speeds (Uk2nd), current rotor rotation speeds (ωk2nd), and current output power values (Pk2nd).


The second average values of current free wind speeds (Uk2nd), current rotor rotation speeds (Ωk2nd), and current output power values (Pk2nd) calculated in the Kalman filter module (17) are sent and stored to the experience replay buffer (18), and current assembly angles (φk) and current, really measured relative wind direction values' second average values are sent to the yaw misalignment calculation and calibration module (22).


The experience replay buffer (18) is a circular buffer to save Nexp unit experience features where a unit experience feature is composed of a free wind speed (U), a rotor rotation speed (Ω) a flow deflection angle (δ), and an output power value (P).


To predict current flow deflection angels (δk), in the initialized actor-critic based flow deflection angle prediction reinforcement learning module (19), a sequence of Nseq input unit features from previous input unit features to a current input unit feature as Equation (22) is sampled from the experience replay buffer (18) where the current input unit feature is defined as a current flow deflection angle, the second average values of current free wind speeds and rotor rotation speeds as Equation (21). Then, as Equation (23), via the actor (π), current flow deflection angles (δk) are predicted, and sent, and stored to the experience replay buffer (18) and sent to the yaw misalignment calculation and calibration module (22).










x
k

=




δ

k
-
1


,

U


k


2

nd


,

Ω
k

2

nd









(
21
)













x

k
-

N
seq

+

1
:
k



=




δ

k
-


N
seq

:
k

-
1


,

U



k
-

N
seq

+

1
:
k




2

nd


,

Ω

k
-

N
seq

+

1
:
k



2

nd









(
22
)













δ
k

=

π

(


x

k
-

N
seq

+

1
:
k



,

h
0

,


c
0

;

W
actor



)





(
23
)







The actor-critic based flow deflection angle prediction reinforcement learning module (19) obtains batch-sequence (NbatchRL×Nseq) training data with the period of Tlearn from the experience replay buffer (18) via sampling current reinforcement learning unit feature NbatchRL sequences as Equation (25) based on the reinforcement learning unit feature sequence of Equation (24).










τ

k
-

N
seq

+

1
:
k



=




U



k
-

N
seq

+

1
:
k




2

nd


,

Ω

k
-

N
seq

+

1
:
k



2

nd


,

δ

k
-


N
seq

:
k

-
1


,

P

k
+
1


2

nd


,

U



k
-

N
seq

+

2
:
k

+
1



2

nd


,

Ω

k
-

N
seq

+

2
:
k

+
1


2

nd


,

δ

k
-

N
seq

+

1
:
k










(
24
)













τ



1
:

N
batch
RL


k

-

N
seq

+
1

:
k


=




U




1
:

N
batch
RL


,

k
-

N
seq

-

1
:
k





2

nd


,

Ω


1
:

N
batch
RL


,

k
-

N
seq

+

1
:
k




2

nd


,

δ


1
:

N
batch
RL


,

k
-


N
seq

:
k

-
1



,

P


1
:

N
batch
RL


,

k
+
1



2

nd


,

U




1
:

N
batch
RL


,

k
-

N
seq

+

2
:
k

+
1




2

nd


,

Ω


1
:

N
batch
RL


,

k
-

N
seq

+

2
:
k

+
1



2

nd


,

δ


1
:

N
batch
RL


,

k
-

N
seq

+

1
:
k











(
25
)







As shown in FIG. 9, the critic model is a recurrent neural network model with the recurrent neural network unit layer (15) of the LSTM or LSTM variant and is defined as Equation (26), as an action-value function of the reinforcement learning model having the input sequence of the recurrent neural network model as a current state sequence






s


k
-

N
seq

-
1

:
k





and a current action (ak) as a factor affecting the output feature of the recurrent neural network, via using the state sequence






s


k
-

N
seq

-
1

:
k





from previous states to a current state and the current action (ak) as the input. The current state (sk=<Uk, Ωk>) of the reinforcement learning model is defined with the current free wind speed (Uk) and the current rotor rotation speed (Ωk), and the current action (akk) of the reinforcement learning model is defined as the current flow deflection angle (δk).






Q(ak,sk−Nseq+1:k,h0,c0;Wcritic)  (26)


where h0 is the 0th hidden internal feature vector of the sequence, and c0 is the 0th cell internal feature vector of the sequence. When training the actors and critic models of the actor-critic based flow deflection angle prediction reinforcement learning module (19) using batch-sequence training data, first, batch-sequence training data is randomly shuffled to obtain independence between the sequence data in the batch, and then randomly shuffled batch-sequence training data is created.


When training, to increase sample efficiency, the mini-batch number (Nmini-batchRL) is used as a batch unit in the shuffled batch-sequence training data, and step-based iterative training is carried out with the integer step number (RstepRL=NbatchRL/Nmini-batchRL, RstepRLϵN) greater than 0, and such the step-based iterative within epoch-based iterative learning is carried out with the epoch number (NepochRL). In the epoch-based iterative training, training is carried out by obtaining different shuffled batch-sequence training data for each epoch as the batch-sequence training data.


When the step-based iterative training of an actor-critical model within the epoch-based iterative learning is carried out using shuffled batch-sequence training data, shuffled mini-batch-sequences are sampled to train critic and actor models iteratively.


The actor-critic based flow deflection angle prediction deep reinforcement learning model is a model to predict flow deflection angles for optimized yaw control adapting to a wind turbine's environment in real-time, receiving free wind speed and rotor rotation speed states relevant to a wind turbine's environment, and output power as a reward value, in real-time. The current return of this model is the differential return defined as Equation (27).






G
k
=R
k+1
R+R
k+2
R+R
k+3
R+ . . .   (27)



R is an average reward value, and a current reward value (Rk) are the second average value (Pk2nd) of current output power values. The differential return expectation value of Equation (27) for the current state sequence






s


k
-

N
seq

-
1

:
k





and the current action (ak) is the action-value function (Q) of Equation (26).


Here, with a temporal difference training method, the action-value function (Q) of the critic model is trained by the gradient descent method via defining the critic model's loss function as the expected value of a squared advantage as Equation (31) where the advantage is defined as Equation (30) using the shuffled mini-batch training data for critic model training as Equation (29).










τ


1
:

N

mini



batch

RL


,

k
-

N
seq

+

1
:
k




shuffled



critic


=




U




1
:

N

mini



batch

RL


,

k
-

N
seq

+

1
:
k





2

nd


,

Ω


1
:

N

mini



batch

RL


,

k
-

N
seq

+

1
:
k




2

nd


,

P


1
:

N

mini



batch



,

k
+
1



2

nd


,


U





1
:

N

mini



batch



,

k
-

N
seq

+

2
:
k

+
1



,


2

nd




Ω


1
:

N

mini



batch

RL


,

k
-

N
seq

+

2
:
k

+
1



2

nd










(
29
)













A
i

=


R

i
,
k


-

R
_

+

Q

(


s

i
,

k
-

N
seq

+

2
:
k

+
1



,

a

i
,

k
+
1



,

h
0

,


c
0

;

W
critic



)

-

Q

(


s

i
,


k
-

N
seq

+
1

:
k



,

a

i
,
k


,

h
0

,


c
0

;

W
critic



)






(
30
)















L
critic

=


1

N

mini



batch

RL







i
=
1


N

mini



batch

RL




(

A
i

)

2








(
31
)







i is the index of the shuffled mini-batch, and the advantage of Equation (30) is the estimated error of the reward value, and the average reward value (R) is approximated via updating iteratively with the temporal difference method as Equation (32).










R
¯




R
¯

+

η


1

N

mini



batch

RL







i
=
1


N

mini



batch

RL



(

A
i

)








(
32
)







η is an average reward value update coefficient, which is a real number greater than 0.


Also, using the shuffled mini-batch-sequence training data to train the actor model as Equation (33), the actor model is trained with the proximal policy optimization method by the gradient ascent method via defining a loss function where for the multiplication (ri(Wactor) Ai) of a probability ratio and the advantage (Ai) as Equation (35), the actor model's loss function becomes a certain value using the probability ratio of a current actor model (π) over a previous actor model (πold) as Equation (34) in the case that the advantage (Ai) is greater than 0, and the probability ratio (ri(Wactor)) is equal to and more than 1+ε, or the advantage (Ai) is less than 0, and the probability ratio (ri(Wactor)) is equal to and less than 1−ε. ε is a value greater than 0 and less than 1.










τ


1
:

N

mini



batch

RL


,

k
-

N
seq

+

1
:
k




shuffled



actor


=





U




1
:

N

mini



batch

RL


,

k
-

N
seq

+

1
:
k




,

Ω


1
:

N

mini



batch

RL


,

k
-

N
seq

+

1
:
k




,

δ


1
:

N

mini



batch

RL


,

k
-


N
seq

:

k
-
1





,

P


1
:

N

mini



batch

RL


,

k
+
1



,

U




1
:

N

mini



batch

RL


,

k
-

N
seq

+

2
:
k

+
1




,

Ω


1
:

N

mini



batch

RL


,

k
-

N
seq

+

2
:
k

+
1



,

δ


1
:

N

mini



batch

RL


,

k
-

N
seq

+

1
:
k











(
33
)
















r
i

(

W
actor

)

=


π

(


a

i
,
k






"\[LeftBracketingBar]"



s

i
,

k
-

N
seq

+

1
:
k




,


a

t
,

k
-


N
seq

:
k

-
1



:

W
actor





)



π
old

(


a

i
,
k






"\[LeftBracketingBar]"



s

i
,

k
-

N
seq

+

1
:
k




,


a

i
,

k
-


N
seq

:
k

-
1



:

W
old
actor





)







(
34
)













L
actor

=


1

N

mini



batch

RL







i
=
1


N

mini



batch

RL




min

(




r
i

(

W
actor

)



A
i


,


clip

(



r
i

(

W
actor

)

,

1
-
ε

,

1
+
ε


)



A
i



)







(
35
)







The actor and critic models' weight data of the actor-critic flow deflection angle prediction deep reinforcement learning model are stored as actor-critic model weight data (21).


Finally, the yaw misalignment calculation and calibration module (22) calculates current yaw misalignment values (γk) as Equation (36) using current assembly angles (φk) and the second average values (μzk2nd) of current, really measured relative wind direction values obtained from the Kalman filter module (17) and current flow deflection angles obtained from the actor-critic based flow deflection angle prediction reinforcement learning module (19). And estimated yaw misalignment values are sent to the yaw controller (23) to perform real-time yaw misalignment control.





γkzk2ndkk  (36)


REFERENCE SIGNS LIST






    • 1: Blade


    • 2: Rotor


    • 3: Nacelle


    • 4: Yawing system


    • 5: Yaw controller


    • 6: Meteorological mast


    • 7: Wind direction


    • 8: Nacelle direction


    • 9: Turbulence


    • 10: North


    • 11: Real wind direction


    • 12: Nacelle direction


    • 13: Yawing system axis


    • 14, 15: Recurrent neural network unit layer of the LSTM or LSTM variant


    • 16: Wind vane, anemometer, output power sensor, and rotor rotation speed sensor


    • 17: Kalman filter module


    • 18: Experience relay buffer


    • 19: actor-critic based flow deflection angle prediction reinforcement learning module


    • 20: Pre-trained weight of the recurrent neural network based sequence flow deflection prediction model


    • 21: Trained actor-critic model weight data


    • 22: Yaw misalignment calculation and calibration module


    • 23: Yaw controller





PATENT LITERATURE



  • [1] Korea registered patent 10-1800217, CORRECTION METHOD FOR YAW ALIGNMENT ERROR OF WIND TURBINE



NON PATENT LITERATURE



  • [2] Determination of optimal wind turbine alignment into the wind and detection of alignment changes with SCADA data, 2018, by Niko Mittelmeier and Martin Khun


Claims
  • 1. The Kalman filter and deep reinforcement learning based yaw misalignment control method of a wind turbine comprises: The step to receive operation data including really measured relative wind direction values during the normal operation for each wind turbine in a wind farm, and generating raw data for training a flow deflection angle prediction model;The step to obtain assembly angles and flow deflection angles by applying a Kalman filter to really measured relative wind direction values for each wind turbine;Previous flow deflection angles, current free wind speeds, and current rotor rotation speeds are used as input features, and current flow deflection angles are used as a target feature, and training data with a certain number of sequences is generated. The step to generate weight data by training a recurrent neural network based sequence flow deflection angle prediction model using the training data;The recurrent neural network based sequence flow deflection angle prediction model is used as an actor in an actor-critic deep reinforcement learning model, and a recurrent neural network based sequence relationship model of free wind speeds, rotor rotation speed, and flow deflection angles and differential action values is used as a critic. The step to generate an actor-critic flow deflection angle prediction deep reinforcement learning model using output power as a reward value;The step to predict non-stationary flow deflection angles by loading the weight data of a pre-trained recurrent neural network based sequence flow deflection angle prediction model as the actor weight data of the actor-critic flow deflection angle prediction deep reinforcement learning model;The step of estimating and calibrating yaw misalignment values by adding flow deflection angles predicted by the actor-critic flow deflection angle prediction deep learning model and assembly angles obtained by the Kalman filter, using really measured relative wind direction values.
  • 2. The method of claim 1, wherein the operation data further includes free wind speed, output power, rotor rotation speed, and measurement time values.
  • 3. The method of claim 1, wherein the obtaining step of the flow deflection angle comprises: The step to obtain a series of the summed values of assembly angles and flow deflection angles by applying a Kalman filter to a series of really measured relative wind direction values for each wind turbine;The step to obtain a series of assembly angles by applying a Kalman filter to a series of really measured relative wind direction values; And The step to obtain a series of flow deflection angles for each wind turbine by subtracting a series of assembly angles from a series of the summed values of assembly angles and flow deflection angles.
  • 4. The method of claim 3, wherein the step of obtaining the series of the summed values of assembly angles and flow deflection anglesincludes the step to calculate the first average values by averaging a series of really measured relative wind direction values over the first average value time and apply the calculated first average values to a Kalman filter, andthe step of obtaining the series of assembly anglesincludes the step to calculate the second average values by averaging a series of really measured relative wind direction values over the second average value time and apply the calculated second average values to a Kalman filter.
  • 5. The step to receive measured relative wind direction values, free wind speeds, output power values, and rotor rotation speeds information from the wind vane, the anemometer, the output power sensor, and the rotor rotation speed sensor as an input signal; The step to calculate assembly angles from measured relative wind direction values through a Kalman filter;The average values of each of the really measured relative wind direction, free wind speed, rotor rotation speed, and output power over a certain period of time are calculated, and the really measured relative wind direction average value, free wind speed average value, rotor rotation speed average value, and output power average value are stored in the experience replay buffer (18); Andin the actor critic based flow deflection angle prediction deep reinforcement learning module (19), a previous flow deflection angle, a current free wind speed, a current rotor rotation speed, a current flow deflection angle, and a next output power value stored to the experience replay buffer (18) are used as an unit experience value feature, and a series of experience value feature sequences is randomly sampled to generate training data, and actors and critic models are trained, and the trained weight data (21) of the actors and critic models is stored, and after training, current flow deflection angles are predicted using the sequence data of previous flow deflection angles, current free wind speeds, and current rotor rotation speeds for a certain period of time; Andyaw misalignment calibration values are calculated using current, measured relative wind direction average values, assembly angles, and flow deflection angles in the yaw misalignment calibration module (22), and the yaw misalignment calibration information is transmitted to the yaw controller (23). Andthe yaw misalignment is calibrated in real-time by the yaw controller using the yaw misalignment calibration information.
Priority Claims (1)
Number Date Country Kind
10-2020-0186882 Dec 2020 KR national
PCT Information
Filing Document Filing Date Country Kind
PCT/KR2021/020228 12/29/2021 WO