KALMAN FILTER AND DEEP REINFORCEMENT LEARNING BASED WIND TURBINE YAW MISALIGNMENT CONTROL METHOD

TECHNICAL FIELD

The present invention relates to a yaw misalignment control method for maximizing wind turbine power production through a yaw misalignment calibration value prediction model based on the Kalman filter and deep reinforcement learning. Specifically, the present invention relates to a yaw misalignment control method to predict flow deflection angles caused by wake using free stream wind speeds and rotor rotation speeds received from a wind turbine through a Kalman filter and recurrent neural network based sequence flow deflection angle prediction model and to estimate and calibrate yaw misalignment with relative wind direction values obtained from wind turbine operation data, assembly angles calculated via the Kalman filter, and flow defection angles predicted by an actor-critic flow deflection angle prediction deep reinforcement learning model which is a self-learning model to maximize the power production of a wind turbine, using produced power as a reward value during the normal operation of a wind turbine.

BACKGROUND ART

In the case of a horizontal axis wind turbine generating output power by rotating the rotor to a direction facing the wind direction as shown in [FIG. 1], the nacelle must be rotated to the direction facing the wind direction to maximize power production. To rotate the nacelle in the direction facing the wind direction, these wind turbines have a yawing system that rotates the nacelle, a meteorological mast composed of a wind vane and an anemometer installed on the nacelle, and a yaw controller that controls the yawing system to calculate and calibrate yaw misalignment.

As shown in [FIG. 2], the yaw misalignment can be defined with the relative wind direction for the nacelle direction, the assembly angle of a wind vane or lidar, and the flow deflection angle caused by wake behind the rotor. Here, the relative wind direction can be measured directly from a wind vane, but a wind vane's assembly angle must be measured and corrected before a wind turbine is operated. The flow deflection angle must be calculated and corrected during the operation. To accurately calculate and correct a wind vane's assembly angle and the flow deflection angle in an existing method, a lidar that precisely measures a free wind direction and a free stream wind speed is installed on the ground. The assembly angle is measured and corrected before operation. A lidar is always installed on the nacelle to measure the relative wind direction without the flow deflection angle to calibrate the yaw misalignment during the operation.

[1] and [2] are relevant to a method to reduce the cost of installing and operating a nacelle-based lidar in all wind turbines by calibrating the yaw misalignment without the nacelle-based lidar during the operation after a relationship model between variables that affect the flow deflection angle and a relative wind direction variable by installing a lidar on the nacelle or the ground before operation. [1] calibrates the yaw misalignment by estimating a relative wind direction corrected for the flow deflection angle using a machine learning model as a relational model, and [2] calibrates the yaw misalignment by estimating the flow deflection angle using a statistical analysis-based relation model and correcting the flow deflection angle for a measured relative wind direction.

These methods assume that the assembly angle and the flow deflection angle are stationary over time. However, an actual assembly angle varies over time. The relationship characteristics between the rotor rotation speed and free wind speed which affect the flow deflection angle and the flow deflection angle, also change over time by the wake effect generated by other wind turbines in a wind farm. So, assembly angle and the flow deflection angle characteristics have non-stationarity that varies over time.

For this reason, conventional methods such as [1] and [2] can precisely calibrate the yaw misalignment using a machine learning-based yaw misalignment calibration model and a statistics-based yaw misalignment calibration model without utilizing a lidar. Still, there arises a problem that the assembly angle must be re-corrected each time due to the non-stationarity of the assembly angle and the flow deflection angle. A flow deflection angle correction relationship model must be re-developed and applied.

In addition, since the methods [1] and [2] are yaw misalignment calibration models based on the stationarity of the flow deflection angle, the yaw misalignment problem caused by the wake effect, which is a leading cause that affects the non-stationarity of the flow deflection angle, could not be solved.

Technical Problem

Among components of the yaw misalignment of a horizontal axis wind turbine, the assembly angle and the flow deflection angle have the non-stationarity. When developing a yaw misalignment calibrating model without the premise of such the non-stationarity, there arises a problem of periodically developing and applying a new yaw misalignment calibrating model of wind turbines in a wind farm. There arises a problem that the wake effect that affects the non-stationarity of the flow deflection angle could not be corrected effectively. To solve this problem, a yaw misalignment calibrating model should be developed under the premise that the assembly angle and the flow deflection angle have the non-stationarity.

Solution to Problem

A wind vane's measured relative wind direction values reflect the assembly angle and the flow deflection angle. Assuming that measured relative wind direction values follow the hidden Markov model, the Kalman filter can be applied to a series of measured relative wind direction values to calculate the assembly angle and the flow deflection angle having the non-stationarity. The main factors influencing the flow deflection angle are the free wind speed and rotor rotation speed when not affected by the wake effect. In this hidden Markov model, hereafter, measured relative wind direction values are referred to as observed relative wind direction values, and hereafter, relative wind direction values really measured in a wind vane are referred to as really measured relative wind direction values.

Here, via training a recurrent neural network sequence model where a current flow deflection angle calculated from a series of really measured relative wind direction values through the Kalman filter is used as a target feature, and a previous flow deflection angle, a current free wind speed, and a current rotor rotation speed are used as input features, it is possible to obtain a nonlinear relationship model between the free wind speed and rotor rotation speed and the flow deflection angle.

To train a Kalman filter and recurrent neural network based sequence flow deflection angle prediction model, when wind turbines in a wind farm are operated normally for a certain term, the average values of really measured relative wind direction values, free wind speeds, output power values, and rotor speeds and measurement times for a certain period are received from wind turbines and stored. Raw data is generated for training the sequence flow deflection angle prediction model.

The Kalman filter is applied to a series of really measured relative wind direction values for each wind turbine to obtain a series of the summed values of assembly angles and flow deflection angles, and the Kalman filter is applied to a series of really measured relative wind direction values to obtain a series of assembly angles. A series of the summed values of assembly angles and flow deflection angles is subtracted by a series of assembly angles to obtain a series of flow deflection angles for each wind turbine. Training data of a certain number of sequences is generated using previous flow deflection angles, current free wind speeds, and current rotor rotation speeds as input features, and current flow deflection angles as a target feature. Then, the recurrent neural network based sequence flow deflection angle prediction model is trained with the training data.

The recurrent neural network based sequence flow deflection angle prediction model is a relationship model under the premise that a sequence relationship between previous flow deflection angles, current free wind speeds, and current rotor rotation speeds and current flow deflection angles has stationarity. Therefore, it is impossible to predict flow deflection angles by reflecting the non-stationarity of the flow deflection angle using this model.

The recurrent neural network based sequence flow deflection angle prediction model is used as an actor in an actor-critic deep reinforcement learning model, and a recurrent neural network based sequence relationship model of free wind speeds, rotor rotation speeds, flow deflection angles, and differential action values is used as a critic. With an actor-critic flow deflection angle prediction deep reinforcement learning model using output power as a reward value, a relation model reflecting the non-stationarity of the sequence relationship of previous flow deflection angles, current free wind speeds, and current rotor rotation speeds and current flow deflection angles can be trained. Then it is possible to more accurately predict the flow deflection angle with the non-stationarity due to turbulence and the wake effect.

The weight data of a pre-trained recurrent neural network based sequence flow deflection angle prediction model is loaded as the actor's weight data of the actor-critic flow deflection angle prediction deep reinforcement learning model, so the actor-critic flow deflection angle prediction deep reinforcement learning model learns by itself and predicts the flow deflection angle with the non-stationarity while reliably predicting the flow deflection angle without initial reinforcement learning.

Yaw misalignment values are estimated by adding flow deflection angles predicted by the actor-critical flow deflection angle prediction deep reinforcement learning model with really measured relative wind direction values and assembly angles obtained by applying the Kalman filter to really measured relative wind direction values. The yaw misalignment is calibrated using estimated yaw misalignment values.

Advantageous Effects of Invention

When calibrating the yaw misalignment of a wind turbine, assembly angles must be corrected at regular intervals, and a yaw misalignment estimation relationship model including the flow deflection angle must be re-analyzed or trained because of the non-stationarity of the assembly angle and the flow deflection angle of the yaw misalignment. In particular, periodic correction for the non-stationarity of the assembly angle and the flow deflection angle of many wind turbines in a wind farm incurs enormous costs. However, during the normal operation of a wind turbine, nonstationary assembly angles are calculated by applying the Kalman filter to a series of really measured relative wind direction values, and non-stationary flow deflection angles are calculated using the actor-critical flow deflection angle prediction deep reinforcement learning model. With the calculated assembly and flow deflection angles, it is possible to estimate and calibrate the yaw misalignment in real-time. That is, the calibration of the yaw misalignment of the non-stationarity is fully automated to maximize the reduction of the operating cost of the manual calibration for the nonstationary yaw misalignment.

The actor-critic flow deflection angle prediction deep reinforcement learning model, which applies the recurrent neural network based sequence flow deflection angle prediction model as the actor, automatically learns the flow deflection angle prediction model by itself using output power as a reward value during the normal operation of a wind turbine and then, predicts flow deflection angles. Through the yaw misalignment calibration that reflects the flow deflection angle's non-stationarity affected by turbulence and the wake effect that change in real-time, it is possible not only to calibrate the yaw misalignment more accurately than a calibration method that does not reflect the non-stationarity of the yaw misalignment but also to correct the wake effect being a significant factor reducing power generation for wind farm power generation and then, maximize the output power of a wind turbine itself and the power production of a wind farm.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram of components of a yaw control system of a horizontal axis wind turbine.

FIG. 2 is a schematic diagram of a yaw misalignment component.

FIG. 3 is a flowchart illustrating a method for controlling a yaw misalignment based on a Kalman filter and deep reinforcement learning according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of a hidden Markov model of an observed relative wind direction.

FIG. 5 is a flowchart to calculate summed values of assembly angles and flow deflection angles or assembly angles from a series of really measured relative wind direction values using a Kalman filter according to an embodiment of the present invention.

FIG. 6 is a flowchart for calculating flow deflection angles using a Kalman filter according to an embodiment of the present invention.

FIG. 7 is a schematic diagram of a recurrent neural network based sequence flow deflection angle prediction model according to an embodiment of the present invention.

FIG. 8 is a block diagram of a self-learning yaw misalignment control intelligent entity according to an embodiment of the present invention.

FIG. 9 is a schematic diagram of an actor model according to an embodiment of the present invention.

DESCRIPTION OF EMBODIMENTS

The present invention will be described using the accompanying drawings through the following contents by way of specific examples.

The present invention relates to a yaw misalignment calibration method of a horizontal axis wind turbine. As shown in [FIG. 1], the horizontal axis wind turbine is mainly composed of the blade (1), rotor (2), and nacelle (3), and since the horizontal axis wind turbine's output power efficiency decreases due to the misalignment of the nacelle direction (8) for the wind direction (7), the output power efficiency is increased by rotating the nacelle to align the nacelle direction with the wind direction as yaw controlling. Via yaw controlling, a relative wind direction is measured in the meteorological tower (6) installed in the nacelle. The yaw misalignment is estimated using relative wind direction values really measured in the yaw controller (5). The yawing system (4) is controlled to calibrate the yaw misalignment.

However, shown in FIG. 3, the relative wind direction really measured in a wind vane for the real relative wind direction (μ_r) is distorted not only by the flow deflection angle (δ) incurred by the wind's blades passing and the assembly angle (φ) of a wind vane, but also in a measurement process by a sensor.

α_o=g(−φ−δ+μ_r) (1)

where μ_oare observed relative wind direction values in the hidden Markov model, g is a transformation function of a sensor. To calibrate the yaw misalignment from a distorted, observed relative wind direction values (μ_o), assembly angles (φ) and flow deflection angles (δ) are calculated, and then, as Equation 2, the yaw misalignment (γ) of the real relative wind direction should be estimated

γ=μ_r=φ+δ+g⁻¹(μ_o) (2)

As shown in FIG. 3, to estimate and calibrate the yaw misalignment (γ) of the real relative wind direction (μ_r), in step S1, a series of really measured relative wind direction values, free wind speeds, output power values, and rotor rotation speeds during the normal operation for each wind turbine in a wind farm are received and saved as operation data. In step S2, using the operation data as the training data of the Kalman filter and recurrent neural network based flow deflection angle prediction model, the recurrent neural network based sequence flow deflection angle prediction model is created and trained where current flow deflection angles calculated through the Kalman filter is used as a target feature, and previous flow deflection angles, free wind speeds, and rotor rotation speeds are used as input features. The weight of the recurrent neural network based sequence flow deflection angle prediction model is used as the pre-trained weight of the actor model of the actor-critic based flow deflection angle prediction deep reinforcement learning model. So, we store the weight data of a trained recurrent neural network based sequence flow deflection angle prediction model.

The yaw misalignment method through the Kalman filter and actor-critic based flow deflection angle prediction deep reinforcement learning, by itself, learns a yaw misalignment method to maximize power production in response to changes in the characteristics of factors that affect wind power during the operation of a wind turbine. It is an intelligent and automated method to calibrate the yaw misalignment. The system in which this method is implemented is a self-learning yaw misalignment control intelligent entity (100) equipped with intelligent software on a computer, as shown in [FIG. 8], and is installed to interlock with a control system in a wind turbine.

In step S3, after the self-learning yaw misalignment control intelligent entity (100) is installed in a wind turbine, before the first wind turbine operation, according to an operator's choice, the actor-critic flow deflection angle prediction deep reinforcement learning module (19) is initialized by loading the pre-trained recurrent neural network based sequence flow deflection angle prediction model weight data (20) as actor weight data and then, can be initialized by loading the pre-trained recurrent neural network based sequence flow deflection angle prediction model weight data (20) or by loading actor-critic model weight data trained while operating a wind turbine as the actor weight data, before a wind turbine is operated.

After starting the operation of a wind turbine in step S4, in the self-learning yaw misalignment control intelligent entity (100), the really measured relative wind direction, free wind speed, output power, and rotor rotation speed information is received from the wind vane, anemometer, output power sensor, and rotor rotation speed sensor (16) as an input signal, and the Kalman filter module (17) calculates assembly angles through the Kalman filter from a series of really measured relative wind direction values and averages really measured relative wind direction values, free wind speeds, rotor rotation speeds, and output power values over a certain period. The averaged values of the free wind speed, rotor rotation speed, and output power are sent and stored to the experience replay buffer (18). Really measured relative wind direction average values and assembly angle are sent to the yaw misalignment calculation and calibration module (22). The actor-critic flow deflection angle prediction deep reinforcement learning module (19) uses a previous flow deflection angle, a current free wind speed, a rotor rotation speed, a flow deflection angle, and a next output power value obtained from the experience replay buffer (18) as a unit experience feature. A series of experience feature sequences are randomly sampled to generate training data, and actor and critic models are trained to store the trained weight data (21) of the actor and critic models. After training, current flow deflection angles are predicted by using the sequence data of previous flow deflection angles, current free wind speeds, and rotor rotation speeds for a certain period. Current flow deflection angle values are sent and stored to the experience replay buffer (18) and sent to the yaw misalignment calculation and calibration module (22).

In step S5, a yaw misalignment calibration value is calculated using really measured relative wind direction average values, assembly angles, and current flow deflection angles received by the yaw misalignment calibration module (22), and the yaw misalignment calibration information is used as an output signal to calibrate the yaw misalignment in real-time.

Next, the S1 and S2 steps to train the Kalman filter and recurrent neural network based flow deflection angle prediction model and the S3, S4, S5 steps to learn the yaw misalignment calibration model by itself according to the situation of a wind turbine in real-time through the self-learning yaw misalignment control intelligent entity (100) will be described in more detail.

In step S1, a series of measurement time values and average values of really measured relative wind direction values (μ_z^raw), free wind speeds (U_∞^raw), output power values (P^raw), and rotor rotation speeds (Ω^raw) for a certain period (T^raw) within a certain term is obtained and stored during the normal operation by each wind turbine of a wind farm. Hereafter, the average value and the variance value over T^raware called a raw average value and a variance value. Here, T^rawis greater than 0, an integer multiple of a unit time, T^unitand satisfies the following equation.

T
^raw
=R
^raw
T
^unit
,R
^raw
ϵN

where N is an integer set. In step S2, to create and train the recurrent neural network based sequence flow deflection angle prediction model, a series of raw average values (μ_z^raw) of really measured relative wind direction values is used to calculate really measured relative wind direction values (μ_z^1st) averaged over T^φ+δnecessary to calculate the summed values of assembly angles and flow deflection angles, and to calculate really measured relative wind direction values (μ_z^2nd) averaged over T^φ necessary to calculate assembly angles. Hereafter, the average value and the variance value over T^φ+δ are called the first average value and the first variance value. The average value and the variance value over T^φ are called the second average value and the second variance value. A series of flow deflection angles is calculated by applying a series of the first and second average values of really measured relative wind direction values to Kalman filtering. Here, T^φ+δ custom-character T^φ are greater than 0, an integer multiple of T^rawand satisfy the following equation.

$T^{φ + δ} = R^{φ + δ} T^{raw}, R^{φ + δ} \in N$

$T^{φ} = R^{φ} T^{raw}, R^{φ} \in N$

$T^{φ} / T^{φ + δ} = \frac{R^{φ} T^{raw}}{R^{φ + δ} T^{raw}} = \frac{R^{φ}}{R^{φ + δ}} = R \in N$

where R^φ+δ is greater than 0, an integer multiple for T^rawto be T^φ+δ, R^φ is greater than 0, an integer multiple for T^rawto be T^φ, and R is greater than 0, an integer multiple as T^φ over T^φ+δ.

As shown in FIG. 4, a series of observed relative wind direction values (μ_o) can be defined with the hidden Markov model. For assembly angles and flow deflection angles to be approximately calculated using the Kalman filter based on this hidden Markov model, primarily, the summed values of current assembly angles and current flow deflection angles are calculated using the Kalman filter with the first average values of really measured relative wind direction values over T^φ+δ for the current real relative wind direction's average and variance values defined in the hidden Markov model to approximate 0 and a certain value respectively, as Equation (3). Current assembly angles are calculated using the Kalman filter with the second average values of really measured relative wind direction values over T^φ for the average and variance values of the summed values of current real relative wind directions and current flow deflection angles defined in the hidden Markov model to approximate 0 and a certain value respectively, as Equation (4). Flow deflection angles are approximately calculated by subtracting calculated assembly angles from calculated summed values of assembly angles and flow deflection angles.

μ_r_k˜N(0,σ_r_k²) (3)

μ_r_k−δ_k˜N(0,σ_μ_r_−δ_k² (4)

where μ_r_kis a current real relative wind direction, μ_r_k_−δ_kis a difference value between the real relative win direction and the current flow direction angle, σ_μr_k²,

$σ_{μ_{r_{e}} - δ_{k}}^{2}$

are the variance values of μ_r_k, μr_k−δ_kover custom-character T^φ+δ,T^φ respectively.

Current observed relative wind direction values (μ_o_k) are defined as Equation (5) on the premise that previous free wind speeds (U_∞_k), previous rotor rotation speeds (Ω_k−1), the previous wake effect (W_k−1)'s external effects, and previous real relative wind direction values (μ_r_k) satisfy the hidden Markov model.

μ_o_k=f_k^μ^r(μ_r_k−1,U_φ_k−1,Ω_k−1,W_k−1) (5)

Current assembly angles (φ_k) are defined as Equation (6) under the premise that the axis direction of a wind vane changes rapidly at random due to other causes or gradually changes (aging) over time, and current assembly angles satisfy the hidden Markov model.

φ_k=f_k^φ(φ_k−1)+φ_k^e+ε_φ_k (6)

where f_k^φ(φ_k−1) is a current internal transformation function for the assembly angle and is approximated as a function for a monotonic function interval to change over time, and φ_k^eis an externally affected, current random variable of the assembly angle, and ε_φ_kis a random variable with an average value of 0 and a certain variance value as the noise of the current assembly angle.

Current flow deflection angles (δ_k) also have non-stationarity due to the wake effect of other wind turbines in a wind farm and the turbulence (9) due to other causes and is defined as in Equation (7) on the premise that current flow deflection angles satisfy the hidden Markov model.

δ_k=f_k^δ(Ω_k−1)+βδ_k^e(U_∞_k,φ_k,W_k)+ε_δ_k (7)

where f_k^δ(δ_k−1) is a current internal transformation function for the flow deflection angle, and p is an externally affected coefficient of the flow deflection angle, and δ_k⁰(U_∞_k, Ω_k, W_k) is a function for the current free wind speed (U_∞_k), current rotor rotation speed (Ω_k), current wake effect (W_k) as an externally affected, current flow deflection angle, and ε_δ_kis a random variable with an average value of 0 and a certain variance value as the noise of the current flow deflection angle.

So, current distorted relative wind direction values (μ_m_k) before being measured by a sensor are defined as Equation (8), and relative wind directions after being measured by a sensor are defined as observed relative wind direction values, as in Equation (9).

μ_m_k=−φ_k−δ_k+μ_r_k (8)

μ_o_k=g(μ_m_k)+ε_g_k (9)

ε_g_kis a random variable with an average value of 0 and a certain variance value as the noise generated from a current sensor measurement.

To estimate real relative wind direction values more accurately using the Kalman filter under the premise that observed relative wind direction values satisfy the hidden Markov model as above, the internal transformation function, the external factor function, and the internal noise of Equations (5), (6), and (7) and the sensor conversion function and sensor noise of Equation (9) must be defined. However, it is very difficult to define each of these internal conversion functions, external factor functions and internal noise, sensor conversion functions, and sensor noise.

However, each internal transformation function is approximated as 1 of a continuous function, and external factor functions to satisfy each transformation of Equations (5), (6), (7) are approximated using a series of really measured relative wind direction values (μ_z) as followings. Then a Kalman filter model can be developed. A current external factor function (ξ_k) is the sum of a current external factor assembly angle function (φ_k^e) and a current external factor flow deflection angle function (δ_k^e) as Equation (10). Such the current external factor function (ξ_k) is approximated as the multiplication of the difference value of near average values (μ_z_k) over T^nearof really measured relative wind direction values (μ_z) and an external factor coefficient (σ_k) which is experimentally calculated as optimal values for the Kalman filter (K^φ+δ) to calculate the summed values of assembly angles and flow deflection angles and for the Kalman filter (K^φ) to calculate assembly angles, respectively and is applied to the calculation of the external factor function.

ξ_k=φ_z_k^e+δ_z_k^e=c_ξ×(μ_z_k−μ_z_k−1) (10)

where T^nearis greater than 0, an integer multiple of T^φ+δ or T^φ and is defined by the following equation.

T
^near
=R
^near
T
^φ+δ or T^near=R^nearT^φ,R^nearϵN

Since each internal noise is very small, they are approximated as values close to 0. A sensor transformation function is defined as a linear function with coefficients (α₁, α₂) as Equation (11) obtained as optimal values via Kalman filter tuning. Since the internal noise is close to 0, the average values of the near variance values of really measured relative wind direction values (μ_z) are used as the approximate values of the sensor noise.

h(μ_m_k)=α₁×μ_m_k+α₂ (11)

FIG. 5 is a flow chart for a specific method to calculate the summed values of assembly angles and flow deflection angles or assembly angles from a series of really measured relative wind direction values (μ_z) via the Kalman filter.

In step S11, a buffer for raw average values of really measured relative wind direction values is initialized. In step S12, current raw average values (μ_z_i^raw) of really measured relative wind direction values are acquired and saved to the buffer. In step S13, if the total integration time of the obtained raw average samples of really measured relative wind direction values satisfies T^φ+δ or T^φ, the S14 step is executed. Otherwise, the S12 step is executed. In step S14, the first or second average values of μ_z_R_φ+δ_−t+1,R_α+δ^rawor μ_z_R_φ_−t+1,R_φ^rawover T^φ+δ or T^φ, are calculated as current, really measured relative wind direction values (μ_z_k), and the first or second variance values are calculated. In step S15, current near average values are calculated and stored using a series of the first or second really measured relative wind direction average values (μ_z_R_near_−k+1,k). The current external factor function (ξ_k) is calculated as a difference value between the current near average value (μ_z_k) and the previous near average value (μ_z_k−1) of really measured relative wind direction values, as Equation (10). In step 16, the first or second previous average values of distorted relative wind direction values and the current external factor function (ξ_k) are used to calculate the first or second average value of distorted relative wind direction values as Equation (12). The variance value of the sum of the internal noises of the assembly angle and the flow deflection angle is approximated as a value close to 0 to calculate the first or second variance value of intermediate, distorted relative wind direction values.

μ_m_k|k−1=μ_m_k−1+ξ_k (12)

p
_k|k−1
=p
_k−1
+q (13)

where μ_m_k−1is the first or second average value of previous, distorted relative wind direction values, and μ_m_k|k−1is the first or second average value of intermediate, distorted relative wind direction values, and p_k−1is the first or second variance value of previous, distorted relative wind direction values, and p_k|k−1is the first or second variance value of intermediate, distorted relative wind direction values, and q is a variance value of the sum of internal noises of the assembly angle and the flow deflection angle and is obtained via Kalman filter tuning for the Kalman filter (K^φ+δ) to calculate the summed values of assembly angles and flow deflection angles and for the Kalman filter (K^φ) to calculate assembly angles, respectively. The initial average and variance values of the first or second distorted relative wind direction values use the first or second average and variance values of really measured relative wind direction values, respectively.

In step S17, the first or second average and variance values of current, observed relative wind direction values are calculated as Equations (14) and (15).

μ_o_k=h(μ_m_k|k−1)=α₁×μ_m_k|k−1+α₂ (14)

s
_k=α₁²p_k|k−1+r_k (15)

r_kis a variance value of current sensor noise and is approximated by the first or second variance value of current, really measured relative wind direction values.

In step S18, a current Kalman gain is calculated as Equation (16) using Equations (13) and (15).

k
_k
=p
_k|k−1α₁s_k⁻¹ (16)

In step 519, to calculate and store the first or second value of current, distorted relative wind direction values as Equation (17), the first or second average value (μ_m_k|k−1) of intermediate, distorted relative wind direction values, the first or second average value (ξ_z_k) of current, really measured relative wind direction values, a current Kalman gain (k_k), and the first or second average value (μ_o_k) of current, observed relative wind direction values are used. To calculate and store the first or second variance value of current, distorted relative wind directions as Equation (18), the current Kalman gain (k_k) and the first or second variance value (p_k|k−1) of intermediate, distorted relative wind direction values are used.

μ_m_k=μ_m_k|k−1+k_k(μ_z_k−μ_o_k) (17)

p
_k=(1−k_kα₁)p_k|k−1 (18)

In step S20, the Kalman filtering is stopped or continued depending on whether or not the Kalman filtering continues.

Thus, a series of the sum values of assembly angles and flow deflection angles is calculated through the Kalman filter through steps S11 to S20 using the first average value of really measured relative wind direction values over T^φ+δ during which real relative wind directions are approximated as 0.

Also, using the second average value of really measured relative wind direction values over T^φ during which all average values of real relative wind directions and flow deflection angles are approximated as 0, only assembly angles are calculated through the Kalman filter through steps S11 to S20.

The procedure to calculate flow deflection angles via the Kalman filter is summarized as follows. As shown in FIG. 6, in step S21, a series of raw average values of really measured relative wind direction values is obtained. In step S22, a series of the first average values of really measured relative wind direction values is obtained. In step S23, summed values of assembly angles and flow deflection angles are calculated through the Kalman filter. In step S24, a series of the second average values of really measured relative wind direction values is obtained. In step 25, assembly angle values are only calculated via the Kalman filter. Finally, in step 26, flow deflection angles are calculated by subtracting assembly angles obtained in step S25 from the summed values of assembly angles and flow deflection angles obtained in step S26. In the case that raw average values (μ_z_t^raw) of current, really measured relative wind direction values are used as an input factor of K^φ, current assembly angles (φ_k) and flow deflection angles (δ_k) with the period of T^φ are calculated where raw average values of really measured relative wind direction values are used as an input factor of K^φ+δat R-1 intervals.

As shown in FIG. 7, the recurrent neural network based sequence flow deflection angle prediction model is created using current flow deflection angles (δ_k) as a target feature and using previous flow deflection angles (δ_k−1) and the second average values of current free wind speeds (U_∞_k^2nd) and current rotor rotation speeds (Ω_k^2nd) as input features. A series of flow deflection angle estimates obtained in step S26 and a series of the second average values of free wind speeds and rotor rotation speeds obtained from wind turbine operation data are used as training data to train the recurrent neural network based sequence flow deflection angle prediction model.

Flow deflection angles are estimated using the Kalman filter, which is a rule-based method, and a series of flow deflection angles estimated by this method are used as the target and input features of the recurrent neural network based sequence flow deflection angle prediction model that is a deep learning based method, and a more accurate flow deflection angle estimation nonlinear relationship model than the rule-based method is obtained.

As shown in [FIG. 7], to create the recurrent neural network based sequence flow deflection prediction model, The LSTM (Long Short-Term Memory) or LSTM variant is used as the recurrent unit layer of the recurrent neural network, and the layer length is determined for the prediction model to be optimized. Equation (19) is the recurrent neural network based sequence flow deflection angle prediction model's function using the LSTM or LSTM variant as the unit layer of the recurrent neural network.

{circumflex over (δ)}_i,k−N_seq_+1:k=F(x_i,k−N_seq_+1:k,h_i,0,c_i,0;W^rnn) (19)

where {circumflex over (δ)}_i,k−N_seq_+1:kis a sequence of target feature vectors composed of flow deflection angles from k−N_seq+1 to k for the ith batch, and N_seqis the number of the sequence of the recurrent neural network based flow deflection prediction model, x_i,k−N_seq_+1:kis a sequence of input feature vectors composed of flow deflection angles from k−N_seqto k−1 and the second free wind speed and rotor rotation speed average values from k−N_seq+1 to k for the ith batch, and h is the h_i,0hidden internal state of the sequence for the ith batch, and c_i,0is the 0th cell internal state of the sequence for the ith batch, and W^rnnis the set of weight parameters for each function.

To train the parameters of each function of Equation (19) using training data, a loss function is defined as Equation (20).

$\begin{matrix} L_{RNN} = \frac{1}{Λ_{batch}^{RNN} \times N_{seq}} \sum_{i = 1}^{N_{batch}^{RNN}} \sum_{j = k + N_{seq} + 1}^{k} {({\hat{δ}}_{i, j} - δ_{i, j})}^{2} & (20) \end{matrix}$

where N_batch^RNNis the number of a batch, and δ_i,jis the jth ground truth of the sequence of the target feature.

By optimizing the loss function of Equation (20) by gradient descent, optimized weight data of the recurrent neural network based sequence flow deflection angle prediction model is obtained.

The optimized weight data is stored to be used as the actor weight data of the actor-critic based flow deflection angle prediction deep reinforcement learning model.

In step S3, after the self-learning yaw misalignment control intelligent entity (100) is installed in a wind turbine, the actor-critic based flow deflection angle prediction reinforcement learning module (19) is initialized by loading the pre-trained weight (20, W^rnn) of the recurrent neural network based sequence flow deflection prediction model as the actor weight data (W^actor), before the first wind turbine operation and then, is initialized through loading the pre-trained W^rnninto the actor or loading the actor weight data (W^actor) and critic weight data (W^critic) trained during the operation according to operators' choice, before a wind turbine's operation.

In step S4, after starting a wind turbine's operation, in the self-learning yaw misalignment control intelligent entity, the average values of current, really measured relative wind direction values (μ_z_t^raw), current free wind speeds (U_∞_t^raw), current output power values (P_t^raw), and current rotor rotation speeds (ω_t^raw) which are averaged for T^raware received from the wind vane, anemometer, output power sensor, and rotor rotation speed sensor (16). Then, as the flowchart of FIG. 5, the Kalman filter module (17) calculates current assembly angles (φ_k) with the T^φ period by using raw average values (μ_z_t^raw) of current, really measured relative wind direction values as the input of K^φ which is the Kalman filter to calculate assembly angles, and with such the T^φ period, calculates the second average values of current, really measured relative wind direction values (μ_z_k^2nd), current free wind speeds (U_∞_k^2nd), current rotor rotation speeds (ω_k^2nd), and current output power values (P_k^2nd).

The second average values of current free wind speeds (U_∞_k^2nd), current rotor rotation speeds (Ω_k^2nd), and current output power values (P_k^2nd) calculated in the Kalman filter module (17) are sent and stored to the experience replay buffer (18), and current assembly angles (φ_k) and current, really measured relative wind direction values' second average values are sent to the yaw misalignment calculation and calibration module (22).

The experience replay buffer (18) is a circular buffer to save N_expunit experience features where a unit experience feature is composed of a free wind speed (U_∞), a rotor rotation speed (Ω) a flow deflection angle (δ), and an output power value (P).

To predict current flow deflection angels (δ_k), in the initialized actor-critic based flow deflection angle prediction reinforcement learning module (19), a sequence of N_seqinput unit features from previous input unit features to a current input unit feature as Equation (22) is sampled from the experience replay buffer (18) where the current input unit feature is defined as a current flow deflection angle, the second average values of current free wind speeds and rotor rotation speeds as Equation (21). Then, as Equation (23), via the actor (π), current flow deflection angles (δ_k) are predicted, and sent, and stored to the experience replay buffer (18) and sent to the yaw misalignment calculation and calibration module (22).

$\begin{matrix} x_{k} = 〈 δ_{k - 1}, U_{\infty_{k}}^{2 nd}, Ω_{k}^{2 nd} 〉 & (21) \end{matrix}$

$\begin{matrix} x_{k - N_{seq} + 1 : k} = 〈 δ_{k - N_{seq} : k - 1}, U_{\infty_{k - N_{seq} + 1 : k}}^{2 nd}, Ω_{k - N_{seq} + 1 : k}^{2 nd} 〉 & (22) \end{matrix}$

$\begin{matrix} δ_{k} = π (x_{k - N_{seq} + 1 : k}, h_{0}, c_{0}; W^{actor}) & (23) \end{matrix}$

The actor-critic based flow deflection angle prediction reinforcement learning module (19) obtains batch-sequence (N_batch^RL×N_seq) training data with the period of T^learnfrom the experience replay buffer (18) via sampling current reinforcement learning unit feature N_batch^RLsequences as Equation (25) based on the reinforcement learning unit feature sequence of Equation (24).

$\begin{matrix} τ_{k - N_{seq} + 1 : k} = 〈 U_{\infty_{k - N_{seq} + 1 : k}}^{2 nd}, Ω_{k - N_{seq} + 1 : k}^{2 nd}, δ_{k - N_{seq} : k - 1}, P_{k + 1}^{2 nd}, U_{\infty_{k - N_{seq} + 2 : k + 1}}^{2 nd}, Ω_{k - N_{seq} + 2 : k + 1}^{2 nd}, δ_{k - N_{seq} + 1 : k} 〉 & (24) \end{matrix}$

$\begin{matrix} τ_{1 : N_{batch}^{RL} k - N_{seq} + 1 : k} = 〈 U_{\infty_{1 : N_{batch}^{RL}, k - N_{seq} - 1 : k}}^{2 nd}, Ω_{1 : N_{batch}^{RL}, k - N_{seq} + 1 : k}^{2 nd}, δ_{1 : N_{batch}^{RL}, k - N_{seq} : k - 1}, P_{1 : N_{batch}^{RL}, k + 1}^{2 nd}, U_{\infty_{1 : N_{batch}^{RL}, k - N_{seq} + 2 : k + 1}}^{2 nd}, Ω_{1 : N_{batch}^{RL}, k - N_{seq} + 2 : k + 1}^{2 nd}, δ_{1 : N_{batch}^{RL}, k - N_{seq} + 1 : k} 〉 & (25) \end{matrix}$

As shown in FIG. 9, the critic model is a recurrent neural network model with the recurrent neural network unit layer (15) of the LSTM or LSTM variant and is defined as Equation (26), as an action-value function of the reinforcement learning model having the input sequence of the recurrent neural network model as a current state sequence

$s_{k - N_{seq} - 1 : k}$

and a current action (a_k) as a factor affecting the output feature of the recurrent neural network, via using the state sequence

$s_{k - N_{seq} - 1 : k}$

from previous states to a current state and the current action (a_k) as the input. The current state (s_k=<U_∞_k, Ω_k>) of the reinforcement learning model is defined with the current free wind speed (U_∞_k) and the current rotor rotation speed (Ω_k), and the current action (a_k=δ_k) of the reinforcement learning model is defined as the current flow deflection angle (δ_k).

Q(a_k,s_k−N_seq_+1:k,h₀,c₀;W^critic) (26)

where h₀is the 0th hidden internal feature vector of the sequence, and c₀is the 0th cell internal feature vector of the sequence. When training the actors and critic models of the actor-critic based flow deflection angle prediction reinforcement learning module (19) using batch-sequence training data, first, batch-sequence training data is randomly shuffled to obtain independence between the sequence data in the batch, and then randomly shuffled batch-sequence training data is created.

When training, to increase sample efficiency, the mini-batch number (N_mini-batch^RL) is used as a batch unit in the shuffled batch-sequence training data, and step-based iterative training is carried out with the integer step number (R_step^RL=N_batch^RL/N_mini-batch^RL, R_step^RLϵN) greater than 0, and such the step-based iterative within epoch-based iterative learning is carried out with the epoch number (N_epoch^RL). In the epoch-based iterative training, training is carried out by obtaining different shuffled batch-sequence training data for each epoch as the batch-sequence training data.

When the step-based iterative training of an actor-critical model within the epoch-based iterative learning is carried out using shuffled batch-sequence training data, shuffled mini-batch-sequences are sampled to train critic and actor models iteratively.

The actor-critic based flow deflection angle prediction deep reinforcement learning model is a model to predict flow deflection angles for optimized yaw control adapting to a wind turbine's environment in real-time, receiving free wind speed and rotor rotation speed states relevant to a wind turbine's environment, and output power as a reward value, in real-time. The current return of this model is the differential return defined as Equation (27).

G
_k
=R
_k+1
−R+R
_k+2
−R+R
_k+3
−R+ . . . (27)

R is an average reward value, and a current reward value (R_k) are the second average value (P_k^2nd) of current output power values. The differential return expectation value of Equation (27) for the current state sequence

$s_{k - N_{seq} - 1 : k}$

and the current action (a_k) is the action-value function (Q) of Equation (26).

Here, with a temporal difference training method, the action-value function (Q) of the critic model is trained by the gradient descent method via defining the critic model's loss function as the expected value of a squared advantage as Equation (31) where the advantage is defined as Equation (30) using the shuffled mini-batch training data for critic model training as Equation (29).

$\begin{matrix} τ_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} + 1 : k}^{shuffled ‐ critic} = 〈 U_{\infty_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} + 1 : k}}^{2 nd}, Ω_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} + 1 : k}^{2 nd}, P_{1 : N_{mini ‐ batch}, k + 1}^{2 nd}, U_{\infty_{1 : N_{mini ‐ batch}, k - N_{seq} + 2 : k + 1},}^{2 nd} Ω_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} + 2 : k + 1}^{2 nd} 〉 & (29) \end{matrix}$

$\begin{matrix} A_{i} = R_{i, k} - \overline{R} + Q (s_{i, k - N_{seq} + 2 : k + 1}, a_{i, k + 1}, h_{0}, c_{0}; W^{critic}) - Q (s_{i, k - N_{seq} + 1 : k}, a_{i, k}, h_{0}, c_{0}; W^{critic}) & (30) \end{matrix}$

$\begin{matrix} L_{critic} = \frac{1}{N_{mini ‐ batch}^{RL}} \sum_{i = 1}^{N_{mini ‐ batch}^{RL}} {(A_{i})}^{2} & (31) \end{matrix}$

i is the index of the shuffled mini-batch, and the advantage of Equation (30) is the estimated error of the reward value, and the average reward value (R) is approximated via updating iteratively with the temporal difference method as Equation (32).

$\begin{matrix} \bar{R} \leftarrow \bar{R} + η \frac{1}{N_{mini ‐ batch}^{RL}} \sum_{i = 1}^{N_{mini ‐ batch}^{RL}} (A_{i}) & (32) \end{matrix}$

η is an average reward value update coefficient, which is a real number greater than 0.

Also, using the shuffled mini-batch-sequence training data to train the actor model as Equation (33), the actor model is trained with the proximal policy optimization method by the gradient ascent method via defining a loss function where for the multiplication (r_i(W^actor) A_i) of a probability ratio and the advantage (A_i) as Equation (35), the actor model's loss function becomes a certain value using the probability ratio of a current actor model (π) over a previous actor model (π_old) as Equation (34) in the case that the advantage (A_i) is greater than 0, and the probability ratio (r_i(W^actor)) is equal to and more than 1+ε, or the advantage (A_i) is less than 0, and the probability ratio (r_i(W^actor)) is equal to and less than 1−ε. ε is a value greater than 0 and less than 1.

$\begin{matrix} τ_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} + 1 : k}^{shuffled ‐ actor} = 〈 {U_{\infty}}_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} + 1 : k}, Ω_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} + 1 : k}, δ_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} : k - 1}, P_{1 : N_{mini ‐ batch}^{RL}, k + 1}, U_{\infty_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} + 2 : k + 1}}, Ω_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} + 2 : k + 1}, δ_{1 : N_{mini ‐ batch}^{RL}, k - N_{seq} + 1 : k} 〉 & (33) \end{matrix}$

$\begin{matrix} r_{i} (W^{actor}) = \frac{π (a_{i, k} ❘ s_{i, k - N_{seq} + 1 : k}, a_{t, k - N_{seq} : k - 1} : W^{actor})}{π_{old} (a_{i, k} ❘ s_{i, k - N_{seq} + 1 : k}, a_{i, k - N_{seq} : k - 1} : W_{old}^{actor})} & (34) \end{matrix}$

$\begin{matrix} L_{actor} = \frac{1}{N_{mini ‐ batch}^{RL}} \sum_{i = 1}^{N_{mini ‐ batch}^{RL}} \min (r_{i} (W^{actor}) A_{i}, clip (r_{i} (W^{actor}), 1 - ε, 1 + ε) A_{i}) & (35) \end{matrix}$

The actor and critic models' weight data of the actor-critic flow deflection angle prediction deep reinforcement learning model are stored as actor-critic model weight data (21).

Finally, the yaw misalignment calculation and calibration module (22) calculates current yaw misalignment values (γ_k) as Equation (36) using current assembly angles (φ_k) and the second average values (μ_z_k^2nd) of current, really measured relative wind direction values obtained from the Kalman filter module (17) and current flow deflection angles obtained from the actor-critic based flow deflection angle prediction reinforcement learning module (19). And estimated yaw misalignment values are sent to the yaw controller (23) to perform real-time yaw misalignment control.

γ_k=μ_z_k^2nd+φ_k+δ_k (36)

REFERENCE SIGNS LIST

- 1: Blade
- 2: Rotor
- 3: Nacelle
- 4: Yawing system
- 5: Yaw controller
- 6: Meteorological mast
- 7: Wind direction
- 8: Nacelle direction
- 9: Turbulence
- 10: North
- 11: Real wind direction
- 12: Nacelle direction
- 13: Yawing system axis
- 14, 15: Recurrent neural network unit layer of the LSTM or LSTM variant
- 16: Wind vane, anemometer, output power sensor, and rotor rotation speed sensor
- 17: Kalman filter module
- 18: Experience relay buffer
- 19: actor-critic based flow deflection angle prediction reinforcement learning module
- 20: Pre-trained weight of the recurrent neural network based sequence flow deflection prediction model
- 21: Trained actor-critic model weight data
- 22: Yaw misalignment calculation and calibration module
- 23: Yaw controller

PATENT LITERATURE

[1] Korea registered patent 10-1800217, CORRECTION METHOD FOR YAW ALIGNMENT ERROR OF WIND TURBINE

NON PATENT LITERATURE

[2] Determination of optimal wind turbine alignment into the wind and detection of alignment changes with SCADA data, 2018, by Niko Mittelmeier and Martin Khun

KALMAN FILTER AND DEEP REINFORCEMENT LEARNING BASED WIND TURBINE YAW MISALIGNMENT CONTROL METHOD

Information

Publication Number

Date Filed

Date Published

Inventors

CPC

International Classifications

Abstract

Description

Claims

Priority Claims (1)

PCT Information