The present application claims priority from Chinese Application Number 202111134821.0, filed Sep. 27, 2021, the disclosure of which is hereby incorporated by reference herein in its entirety.
The present disclosure belongs to the technical field of electric power systems, and in particular relates to a method for multi-task-based predicting massive-user loads based on a multi-channel convolutional neural network.
With the development of electric power systems, a load prediction has permanently played an important role in maintaining the balance of supply and demand and ensuring the safe, stable and economical operation of the electric power grid. With the continuous advancement on the reform of the electricity sales side, a diversified pattern of electric power sales entities will eventually be formed, and electricity sales enterprises have to provide personalized value-added services to improve the competitiveness of their own electricity sales services. The traditional prediction technology for system-level loads can no longer satisfy the technical requirements for the electric power enterprises in the future, and the prediction technology for user-level loads will become a precondition and foundation for the electric power sales enterprises to provide personalized energy services for target customers. An accurate user load prediction can improve the marketing levels of the electric power enterprises, and avoid assessment deviations. As an important part for constructing the smart electric power grid, the advanced measurement system, on one hand, provides a massive data basis for the analysis on the user's electricity behavior characteristics, and on the other hand, it also brings challenges on efficient processing and effective application of the massive data. At present, the user load prediction technology based on massive data is one of the research hotspots in the field of predicting electric power systems.
So far, scholars from China and abroad have produced rich results from theoretical research and practical application during the research process. Most of the load prediction technologies focus on system-level or region-level loads. Compared with aggregated loads such as system-level or region-level loads, the user-level loads are mostly affected by their own electric power consumption behaviors, and have stronger uncertainty and individual randomness, which reduces the measurability and increases the difficulty on prediction. It is difficult for the traditional prediction methods for the aggregated loads to ensure their applicability and prediction accuracy in the user-level load prediction, where the prediction methods need to be changed accordingly. At present, in the research on the methods for predicting the user-level loads, the load prediction method based on a single task is more common. However, with the gradual improvement on the advanced measurement systems, the user-level load data will further present the characteristics of large quantity and various types in the future. Apparently, in the case of massive-user scenarios, if the single-task-based prediction method is adopted to model for each user one by one, it will consume excessive computing and time resources. When the number of users is larger, the operation efficiency will become an important criteria that can not be ignored besides the accuracy when evaluating the performance of the prediction methods. In addition, the single-task-based prediction method also has the problem of ignoring the correlations among different users, and the correlations among the massive data has not been fully mined and deeply studied in the user-level load prediction. Therefore, it is urgent to provide a load prediction method suitable for massive users, which can learn the correlations among user loads and take into account of both the prediction accuracy and the operation efficiency.
The objectives of the present disclosure are as follows. In the present disclosure, in view of the deficiencies of the methods for predicting the massive-user loads at present, including problems such as a lower operation efficiency, a lower prediction accuracy, and a failure to consider the load correlations among different users, a method for multi-task-based predicting massive-user loads based on a multi-channel convolutional neural network is provided, which learns the load correlations among residential users with similar electricity consumption modes based on a clustering technology and a multi-task-based learning strategy, so as to improve both the average prediction accuracy and the overall operation efficiency.
The technical solutions are as follows. The present disclosure provides a method for multi-task-based predicting massive-user loads based on a multi-channel convolutional neural network. The method includes the following steps.
Furthermore, in Step (1), all residential users are clustered into a plurality of clusters with different daily average electricity consumption modes by adopting an agglomerative hierarchical clustering method. The agglomerative hierarchical clustering method is as follows.
2.1 A matrix F including clustering features of N samples is constructed,
F=[f
1
,f
2
, . . . , f
N]T,
where fj is a clustering feature of the j-th sample, j represents serial numbers of samples, T represents transposition, and j=1,2, . . . , N.
2.2 Proximities between each two clusters are calculated by taking each sample as one cluster to obtain an initial proximity matrix P, wherein a calculation formula of an element pk,g in the k-th row and the g-th column is:
P={p
k,g
}k=1, . . . N, g=1, . . . N, and
p
k,g=dis (fk,fg)k≠g,
where dis(*) represents a calculation rule for the proximity of two clusters; both k and g represent serial numbers of the clusters, and fk and fg are clustering features of the k-th and g-th clusters, respectively.
2.3 Two clusters with the highest proximity are merged as a new cluster and the proximity matrix P is updated.
2.4 Step 2.3 is repeated until the total number of the clusters is 1 or a stopping condition is reached.
Furthermore, in Step (2), corresponding input data sets are constructed for various clusters by adopting a multi-channel-based multi-source input fusion method. The multi-channel-based multi-source input fusion method includes as follows.
3.1 A single-user time sequence input is reconstructed. A historical load sequence of a resident user over the week from the day 8 days before the time to be predicted to the previous day of the time to be predicted is reconstructed into a two-dimensional feature map with 7 rows and 24 columns, wherein each row corresponds to daily loads for different dates and each column corresponds to loads of a specific hour on different dates.
3.2 Two-dimensional feature maps of different users in the same cluster are fused by utilizing a channel dimension. The two-dimensional feature maps corresponding to the different users in the same cluster are transmitted to different channels in inputs of the convolutional neural network, wherein data on a single channel is a two-dimensional feature map of one user. The feature maps of the different users in the same cluster are fused by utilizing the channel dimension, and the fused feature map is taken as an input of a feature sharing layer in the convolutional neural network.
In Step (3), the multi-task-based load prediction model based on the convolutional neural network is established for each of the clusters. The load prediction values for different users in the corresponding cluster are output in parallel by each model to eventually obtain the load prediction results of all of the residential users, and the multi-task-based load prediction model based on the convolutional neural network is as follows.
4.1 Load predictions for the different residential users in the same cluster are taken as gdifferent tasks, and a multi-task-based learning strategy is implemented for each cluster in assistance with learning correlations and differences among the loads of the different residential users. The calculation process of a loss function in the multi-task-based learning strategy is specifically as follows.
It is assumed that multi-task-based learning includes V tasks in total, an input and output data set corresponding to each task is {xv, yv}, v=1, 2, . . . V, and then all of the input data sets are:
X={x
1
, . . . , x
v
, . . . x
V}.
An output of a prediction model corresponding to the v-th task is defined as:
y
v
=u
v(X;θhu sha,θv),
where uv represents a mapping function of the prediction model corresponding to the v-th task, θsha is a parameter for the feature sharing layer, and θv is a parameter for the v-th specific task layer, v=1,2, . . . V.
A joint learning is conducted on related tasks for a plurality of tasks in a hard sharing mechanism, and network parameters are trained by minimizing an overall loss function, wherein a calculation of an overall optimization loss function is as follows:
where loss(⋅) represents a loss function for the tasks; and αv is a weight coefficient corresponding to each of the tasks.
4.2 The convolutional neural network is taken as a feature sharing layer for a multi-task-based learning to extract correlations among different tasks. A calculation process of the convolutional neural network is specifically as follows.
4.2.1 Calculations in the convolutional layers are conducted. It is assumed that the number of convolution kernels in the a-th convolutional layer is Ca, and then a set MAPa of output feature maps in the layer is:
where mapea represents an output feature map corresponding to the e-th convolution kernel in the a-th convolutional layer, mapra−1 represents the r-th output feature map in the (a−1)-th layer, Ca−1 is the number of the output feature maps in the (a−1)-th layer, that is, the number of channels included in input data of the a-th convolutional layer, wrea is a kernel parameter for the e-th convolution kernel in the a-th convolutional layer corresponding to the r-th output feature map in a previous layer of the a-th convolutional layer, bea is a bias in the a-th convolutional layer corresponding to the e-th output feature map, and fcon(⋅) represents an activation function in the convolutional neural network.
4.2.2 A calculation in a maximum pooling layer is conducted, which is specifically as follows:
F
down(mapea)=max{pixe,1a,pixe,2a, . . . ,pixe,n
e=1,2, . . . ,Ca,
where Fdown represents a downsampling function in the maximum pooling layer, Ca represents the number of channels, mapea represents the output feature map in the a-th convolutional layer corresponding to the e-th convolution kernel, that is, a feature map in an input of the (a+1)-th pooling layer corresponding to the e-th channel, pixe,za is the z-th pixel in the feature map, z=1,2, . . . , nea+1, nea+1 is the total number of pixels corresponding to the feature map, poolea+1 represents an output feature map in the (a+1)-th pooling layer corresponding to the e-th channel, βea+1 and bea+1 are a multiplicative bias and an additive bias in the output feature map, and fcon(⋅) is an activation function in the pooling layer.
4.3 The convolutional neural network is taken as the feature sharing layer to learn shared information representations among the different users, wherein a model for multi-task-based predicting the massive-user loads based on the multi-channel convolutional neural network mainly includes the feature sharing layer and specific task layers. The bottom portion of the feature sharing layer is formed by alternately connecting two convolutional layers with two pooling layers, and then a flattened result is input into a fully connected layer in the top portion to extract shared features, and transmit the shared features to each of the specific task layers. The specific task layers are configured to extract unique features of each of the users, each of which is specifically formed by a feature extraction enhancement channel, a Concatenate layer and the fully connected layer. The feature extraction enhancement channel is formed by a single fully connected layer, which is configured to extract features from a historical load time sequence of each of the users, to input the extracted features and shared features into the Concatenate layer for fusion. The load prediction values of all of the users in the same cluster are output in parallel after processing by the fully connected layer.
In the present disclosure, firstly, all residential users are clustered into a plurality of clusters with different daily average electricity consumption modes by adopting an agglomerative hierarchical clustering method. Secondly, corresponding input data sets are constructed for various clusters by adopting a multi-channel-based multi-source input fusion method. Then, a multi-task-based load prediction model based on a convolutional neural network is established for each of the clusters. Load prediction values for different users in a corresponding cluster are output in parallel by each model to eventually obtain load prediction results of all of the residential users.
The beneficial effects are as follows. Compared with the prior art, in the present disclosure, realized is a load prediction technology that is suitable for scenarios of predicting massive-user loads and takes into account of both the average prediction accuracy and the overall operation efficiency. Based on the agglomerative hierarchical clustering method, massive users are clustered into a plurality of clusters with different daily average electricity consumption modes, which significantly reduces the total number of modeling times. Moreover, a multi-task-based prediction model based on a multi-channel convolutional neural network is established for each cluster, which is used to extract the shared features and the different features among different users in the same cluster in assistance with better learning individual users, thereby improving the average prediction accuracy. In addition, the load prediction values for a plurality of users can be output in parallel by one single model, which has a wider output adaptation range, stronger model generalization ability, and shorter cumulative time to complete the load prediction tasks of all users, thereby further improving the overall operation efficiency and achieving stronger engineering application value and potential. The present disclosure can provide guidance for electric power enterprises to carry out personalized value-added services, which facilitates to improve their marketing levels, and can provide reference for formulating demand response strategies, thereby further ensuring the economical operation of the electric power systems.
The present disclosure will be further clarified below in combination with specific embodiments, and it should be understood that these embodiments are only used to illustrate the present disclosure and not to limit the scope of the present disclosure. After reading the present disclosure, modifications of various equivalent forms in the present disclosure by those skilled in the art all fall within the scope defined by the appended claims of the present disclosure.
The present disclosure provides a method for multi-task-based predicting massive-user loads based on a multi-channel convolutional neural network. As illustrated in
The specific implementation processes in predicting the loads of massive residential users by using the method of the present disclosure will be described in detail below with reference to the specific embodiments. Taking the residential user data obtained from 805 residential users in total in the user behavior test for intelligently metering the electric power, which is initiated by the Irish Energy Code Commission, as an example, in which each user includes the historical load data sampled every half an hour from Jul. 14, 2009 to Dec. 31, 2010, the load values at each of the o'clock time points are taken as points per hour to form the load data. The resident loads in 24 hours are predicted in advance. The test sets include the data of the last weeks per month, the rest of which are taken as the training sets.
In Step (1), all residential users are clustered into a plurality of clusters with different daily average electricity consumption modes by adopting an agglomerative hierarchical clustering method. The agglomerative hierarchical clustering method is as follows.
2.1 The m-dimensional daily average load vector fj is taken as the clustering feature of each of the resident users, and a matrix F including clustering features of N samples is constructed as follows:
where N is the total number of resident users, that is, the number of samples; fj is a daily average load vector of the j-th user, that is, the clustering feature of each sample; D is the total number of days of the load data; m is the dimension number of the daily average load vector, which is determined by the resolution of the load data, where m is taken as 24 in the present disclosure; ojh represents the value for the h-th dimension of the daily average load vector corresponding to the j-th user; ljd,h represents a historical load value for the j-th user at the h-th hour on the d-th day; and T represents the transposition.
2.2 Proximities between each two clusters are calculated by taking each sample (that is, a user) as a cluster respectively to obtain an initial proximity matrix P. The Euclidean distance is taken as the proximity calculation rule in the present disclosure, wherein a calculation formula of an element pk,g in the k-th row and the g-th column is:
where dis(⋅) represents a calculation rule for the proximity of the two clusters; both k and g represent serial numbers of the clusters, and fk and fg are clustering features of the k-th and g-th clusters, respectively.
2.3 Two clusters with the highest proximity are merged, that is, the two clusters with the closest distance therebetween are merged in the present disclosure as a new cluster, and the proximity matrix P is updated. In the present disclosure, a sum of squares of deviations (Ward) method is adopted to calculate the proximities among clusters. The increment Δ ESS of the sum of squared deviation caused by the current mergence of each two clusters Ci and Cj is calculated, and only the two clusters corresponding to the smallest increment of the sum of squared deviation are merged into a new cluster.
Taking the sum of squared deviation of the cluster Ci as an example, the calculation formula thereof is as follows:
where μi represents the center of the cluster Ci; and Qi represents the number of users included in the cluster Ci.
The formula for calculating the increment of the sum of squared deviations caused by merging clusters Ci and Cj is as follows:
ΔESS=ESS(Ci∪Cj,μi∪j)-ESS(Ci,μi)-ESS(Cj,μj),
where μi, μj and μi∪j represent the centers of cluster Ci, cluster Cj and new cluster Ci∪Cj, respectively.
2.4 Step 2.3 is repeated until the total number of clusters is 1 or a stopping condition is reached. In the present disclosure, the 805 resident users are clustered into 22 classes.
In Step (2), corresponding input data sets are constructed for various clusters by adopting a multi-channel-based multi-source input fusion method, as illustrated in
3.1 A single-user time sequence input is reconstructed. A historical load sequence of a residential user over the week from the day 8 days before the time to be predicted to the previous day at the time to be predicted is reconstructed into a two-dimensional feature map with 7 rows and 24 columns, wherein each row corresponds to daily loads for different dates and each column corresponds to loads of a specific hour on different dates.
3.2 Two-dimensional feature maps of different users in the same cluster are fused by utilizing a channel dimension. The two-dimensional feature maps corresponding to the different users in the same cluster are transmitted to different channels in inputs of the convolutional neural network, wherein data on a single channel is a two-dimensional feature map of one user. The feature maps of the different users in the same cluster are fused by utilizing the channel dimension, and the fused feature map is taken as an input of a feature sharing layer in the convolutional neural network.
In Step (3), a multi-task-based load prediction model based on the convolutional neural network is established for each of the clusters, as illustrated in
4.1 Load predictions for the different residential users in the same cluster are taken as different tasks, and a multi-task-based learning strategy is implemented for each cluster in assistance with learning correlations and differences among the loads of the different residential users. A calculation process of a loss function in the multi-task-based learning strategy is specifically as follows.
It is assumed that multi-task-based learning includes V tasks in total, an input and output data set corresponding to each task is {xv, yv}, v=1, 2, . . . V, and then all of the input data sets are:
X={x
1
, . . . ,x
v
, . . . x
V}
An output of the prediction model corresponding to the v-th task is defined as:
y
v
=u
v(X;θsha,θv),
where uv represents a mapping function of the prediction model corresponding to the v-th task, θsha is a parameter for the feature sharing layer, and θv is a parameter for the v-th specific task layer, v=1,2, . . . V.
A joint learning is conducted on related tasks for a plurality of tasks in a hard sharing mechanism, and network parameters are trained by minimizing an overall loss function, wherein a calculation of an overall optimization loss function is as follows:
where loss(⋅) represents a loss function for the tasks; and αv is a weight coefficient corresponding to each of the tasks.
4.2 The convolutional neural network is taken as a feature sharing layer for a multi-task-based learning to extract correlations among different tasks. A calculation process of the convolutional neural network is specifically as follows.
4.2.1 Calculations in the convolutional layers are conducted. It is assumed that the number of convolution kernels in the a-th convolutional layer is Ca, and then a set MAPa of output feature maps in the layer is:
where mapea represents an output feature map corresponding to the e-th convolution kernel in the a-th convolutional layer, mapra−1 represents the r-th output feature map in the (a−1)-th layer, Ca−1 is the number of the output feature maps in the (a−1)-th layer, that is, the number of channels included in input data of the a-th convolutional layer, wrea is a kernel parameter for the e-th convolution kernel in the a-th convolutional layer corresponding to the r-th output feature map in a previous layer of the a-th convolutional layer, bea is a bias in the a-th convolutional layer corresponding to the e-th output feature map, and fcon(⋅) represents an activation function in the convolutional neural network.
4.2.2 A calculation in a maximum pooling layer is conducted, which is specifically as follows:
F
down(mapea)=max {pixe,1a,pixe,2a, . . . ,pixe,n
e=1,2, . . . ,Ca,
where Fdown represents a downsampling function in the maximum pooling layer, Ca represents the number of channels, mapea represents the output feature map in the a-th convolutional layer corresponding to the e-th convolution kernel, that is, a feature map in an input of the (a+1)-th pooling layer corresponding to the e-th channel, pixe,za is the z-th pixel in the feature map, z=1,2, . . . , nea, nea+1 is the total number of pixels corresponding to the feature map, poolea+1 represents an output feature map in the (a+1)-th pooling layer corresponding to the e-th channel, βea+1 and bea+1 are a multiplicative bias and an additive bias in the output feature map, and fcon(⋅) is an activation function in the pooling layer.
4.3 The convolutional neural network is taken as the feature sharing layer to learn shared information representations among the different users, wherein a model for multi-task-based predicting the massive-user loads based on the multi-channel convolutional neural network mainly includes the feature sharing layer and specific task layers. The bottom portion of the feature sharing layer is formed by alternately connecting two convolutional layers with two pooling layers, and then a flattened result is input into a fully connected layer in the top portion to extract shared features, and transmit the shared features to each of the specific task layers. The specific task layers are configured to extract unique features of each of the users, each of which is specifically formed by a feature extraction enhancement channel, a Concatenate layer and the fully connected layer. The feature extraction enhancement channel is formed by a single fully connected layer, which is configured to extract features from a historical load time sequence of each of the users, to input the extracted features and shared features into the Concatenate layer for fusion. The load prediction values of all of the users in the same cluster are output in parallel after processing by the fully connected layer. The prediction results of the method provided in the present disclosure are as shown in Table 1.
Since the true load values of 0 of the residential users exist in the data set, two error indicators, RMSE and MAE, are selected to measure the average prediction accuracy based on this method for the massive users. The values of the above two error indicators are averaged for all residential users in the present disclosure, and the calculation formulas are as follows:
where U is the total number of the residential users; Ni is the number of samples included in the i-th user; ŷji is the load prediction value for the j-th sample of the i-th user; and yji is the true load value for the j-th sample of the i-th user.
In addition, three other multi-task-based prediction methods are selected as the benchmark methods in the present disclosure. In Method One, the multi-channel multi-source input fusion method in the provided method is replaced with the traditional multi-task-based learning input construction method. In Method Two, the feature extraction enhancement channels at the output terminals in the provided method are removed. In Method Three, the multi-source input fusion method in the provided method is replaced with the traditional multi-task-based learning input construction method, while the feature extraction enhancement channels at the output terminals in the provided method are removed. In such a way, the effectiveness of the multi-channel-based multi-source input fusion method and the feature extraction enhancement channel in improving the average prediction accuracy is verified by users respectively. In addition, three single-task prediction methods based on DNN, CNN, and LSTM, respectively, namely, Method Four, Method Five, and Method Six, are selected as benchmark methods to highlight the advantages of the multi-task-based learning in term of average prediction accuracy and overall operation efficiency. The load prediction results of the six benchmark prediction methods are as shown in Table 1.
It can be seen from Table 1 that the cumulative time spent on the four multi-task-based load prediction methods to complete the load prediction tasks of all users is significantly less than that of the three single-task-based benchmark prediction methods, which reflects the significant advantage of the multi-task-based learning strategy in improving the overall operation efficiency. The average values for the error indicators of Benchmark Method Three is larger than those of the single-task learning prediction method based on DNN, which indicates that the multi-task-based learning benchmark methods using the traditional multi-task-based learning input construction method and output structure cannot improve the average prediction accuracy in the scenarios of increasing massive users. If the input thereof is improved (compared with Method Two) by the multi-channel-based multi-source input fusion method and the output thereof is improved (compared with method One) by adding the feature extraction enhancement channel, the average prediction accuracy can be improved. If the provided solutions are adopted, that is, the input and output of traditional multi-task-based learning are improved at the same time, the prediction accuracy can reach the highest effectiveness, the cumulative operation time is approximate to that in Method Three, and the total duration is proper among the six benchmark prediction methods. The comparison chart of the prediction curves of the solutions provided in the present disclosure and the single-task prediction method based on DNN is as illustrated in
To sum up, the solutions provided in the present disclosure can be applied to scenarios of predicting massive-user loads to handle the user-level load prediction tasks on large scales. Compared with the single-task-based load prediction methods, the time resources are significantly reduced by the solutions provided in the present disclosure. Compared with the load prediction methods based on the traditional multi-task-based learning structure, the average prediction accuracy is improved, thereby realizing the balance between prediction accuracy and operation efficiency, and obtaining a stronger engineering application value and potential, which can provide electric power enterprises a reference basis to provide personalized value-added services for electricity sales, play an important guiding role in improving the marketing levels of electric power enterprises and avoiding assessment deviations, and which can provide an effective reference for the formulation of demand response plans, and facilitates the economic operation of the electric power grid.
Number | Date | Country | Kind |
---|---|---|---|
202111134821.0 | Sep 2021 | CN | national |