This application relates to systems and methods for water distribution network leakage detection and/or localization.
Fast detection and localization of underground water pipe leakage is an important yet challenging issue in water distribution system management. Due to the deterioration of underground water pipes, a large amount of water is lost every year, mostly unnoticed. According to Sadeghioon, about 3,281 megaliters (106) was wasted in the UK during 2009-2011, and about 15% of supplied water was wasted annually in the US. In historical water districts, such as Cleveland, OH, or Boston, MA, the percentage of water lost is significantly higher. Moreover, unnoticed water leakage can lead to serious social impacts due to traffic delay, water contamination, and water scarcity. Therefore, a system that provides real-time water pipe monitoring and enables fast leakage response is critical for agencies to institute preventative strategies with significant socio-economic benefits.
A significant number of studies has been conducted on water pipe leak detection. The strategies are broadly classified into 5 categories, i.e., visual observation-based, sensor/instrumentation based, transient response based, hydraulic model-based, and data-driven based strategies. However, these strategies have encountered different limitations. For instance, the conventional sensor/instrumentation-based technologies require well-trained inspectors to conduct the inspection along the pipes with the help of different types of detection equipment including those based on the optical, acoustic, or electromagnetic sensing principles. This method can be labor-intensive, time-consuming, and cost-prohibitive. Moreover, the reliability of detection are influenced by various factors including the type of leakage, size of the leakage, pipe materials, environmental conditions, and the skill level of the inspector. The transient based technology uses transient pressure or acoustic signals associated with burst events. Such transient signals travel along the pipe at the speed of sound starting from the burst location. However, the transient responses decay with distance and diminish over a short time, and therefore require sensors with high spatial and temporal resolutions, which makes it not suitable for continuous monitoring in all environments. The hydraulic model-based approach requires the use of the hydraulic model simulation of the water distribution network (WDN). But information such as customer water usage, pipe deterioration conditions, pipe physical information is often difficult to collect or is typically not available. Data driven-based leak detection, which is based on learning from historical data with statistical or pattern recognition algorithms, is emerging. Such technologies mainly depend on the available historical dataset without the requirement of collecting a comprehensive set of information of the hydraulic model. Empowered with the Internet of Things (IoT) and artificial intelligence (AI), data-driven technologies have been proven capability in knowledge discovery, image processing, and event forecasting, etc. The development of supervisory control and data acquisition (SCADA) systems also promotes the progress in using data-driven methods for leakage detection since real-time monitoring data of water pressure and/or flow rate are available via SCADA system.
A few data-driven methods have been developed to detect leakage in the WDNs. The previous studies typically formulate the leakage detection problem as either a supervised machine learning (ML) problem or an unsupervised ML problem. For instance, a supervised ML method has been developed by using fully connected DensNet for leakage detection. The sensors were firstly assumed to be placed at different junctions determined by an optimization process. The simulated water pressure data obtained by these sensors was used to train the developed ML model and achieved promising results. For another example, the data collected by piezoelectric accelerometer under non-leaking condition and under leaking condition. The labelled data was trained by a Convolutional Neural Network (CNN) and a Support Vector Machine (SVM) to class leaking versus non-leaking conditions. Although supervised learning approach can achieve a high leakage localization accuracy, it requires a balanced dataset, which means it needs a sufficient amount of WDN operational data under both leaking conditions and nonleaking conditions. However, datasets under leaking conditions are very scarce. Consequently, unsupervised ML model are more feasible practically. For example, Artificial Neural Network can be used to predict the water flow and water pressure one day ahead. Leakage warning was triggered if the difference between the actual data and the predicted data exceeds a threshold. However, the detection accuracy was dependent upon a stable water pressure pattern in the WDN, which however can be significantly affected by water usage behaviors. The developed methods have only been used for leakage detection and not attempted for leakage localization.
This description introduces the development of a novel machine learning (ML) model to detect the occurrence of underground leakage and localize where leaks occur. This new framework, named clustering-then localization semi-supervised learning (CtL-SSL), uses the topological relationship of WDN and its leakage characteristics for WDN partition and sensors placement, and subsequently utilizes the monitoring data for leakage detection and leakage localization.
This method deals with the unique feature of leak detection, where in-service WDNs are short of labeled data under leaking conditions, which makes it infeasible to use common ML models. The developed CtL-SSL framework advances the leak detection strategy by alleviating the data requirements, guiding optimal sensor placement, and locating leakage via WDN leakage zone partition. It features excellent scalability, extensibility, and upgradeability for applications to various types of WDNs. It will provide valuable a tool in sustainable management of the WDNs.
This description also describes the development of a novel machine learning (ML) models to assist the best decisions to achieve the most resilient system recovery. The model integrates Graph Convolutional Neural Network and Deep Reinforcement Learning (GCN-DRL) model to support optimal repair decisions after the water supply system is subjected to natural hazards. Such decisions will improve WDN resilience after natural hazards such as earthquakes etc.
The framework includes a framework for evaluation of the resilience of a WDN, which can be used together with different definitions of the WDN performance. The resilience indicator integrates the dynamic evolution of WDN performance indicators during the post-hazard (earthquake) recovery process. The decision goal is set that the performance indicator for WDN with consideration of the relative importance of the service nodes and the extent of post-hazard water needs that are satisfied.
For example, the GCN of the GCN-DRL model framework is configured to encode the information of the WDN. The topology and performance of service nodes (i.e., the degree of water needs satisfaction) can be considered as inputs to the GCN; the outputs of GCN are the reward values (Q-values) corresponding to each repair action, which are fed into. Also, or as an alternative, the DRL process of the GCN-DRL model framework is configured to select the optimal repair sequence from a large action space.
The decision support model aimed achieve highest system resilience by ensuring the fastest system recovery. This ensures the best decision and resource allocations. The framework can also be used for other types of infrastructure networks.
This description introduces the development of a novel machine learning (ML) model to detect the occurrence of underground leakage and localize where leaks occur. This new framework, named clustering-then localization semi-supervised learning (CtL-SSL), uses the topological relationship of WDN and its leakage characteristics for WDN partition and sensors placement, and subsequently utilizes the monitoring data for leakage detection and leakage localization.
This novel method deals with the unique feature of leak detection, where in-service WDNs are short of labeled data under leaking conditions, which makes it infeasible to use common ML models. The developed CtL-SSL framework advances the leak detection strategy by alleviating the data requirements, guiding optimal sensor placement, and locating leakage via WDN leakage zone partition. It features excellent scalability, extensibility, and upgradeability for applications to various types of WDNs. It will provide valuable a tool in sustainable management of the WDNs.
This description introduces the development of a novel machine learning (ML) models to assist the best decisions to achieve the most resilient system recovery. The model integrates Graph Convolutional Neural Network and Deep Reinforcement Learning (GCN-DRL) model to support optimal repair decisions after the water supply system is subjected to natural hazards. Such decisions will improve WDN resilience after natural hazards such as earthquakes etc.
The framework includes a framework for evaluation of the resilience of a WDN, which can be used together with different definitions of the WDN performance. The resilience indicator integrates the dynamic evolution of WDN performance indicators during the post-hazard (earthquake) recovery process. The decision goal is set that the performance indicator for WDN with consideration of the relative importance of the service nodes and the extent of post-hazard water needs that are satisfied.
The novel GCN-DRL model framework, where, the GCN encodes the information of the WDN. The topology and performance of service nodes (i.e., the degree of water needs satisfaction) can be considered as inputs to the GCN; the outputs of GCN are the reward values (Q-values) corresponding to each repair action, which are fed into the DRL process to select the optimal repair sequence from a large action space.
The decision support model aimed achieve highest system resilience by ensuring the fastest system recovery. This ensures the best decision and resource allocations. The framework can also be used for other types of infrastructure networks.
To further advance the data-driven approach for leakage detection and localization in water distribution network, this description explores a new ML framework that combines the advantages of both supervised ML and unsupervised ML approaches. This new framework, named clustering-then-localization semi-supervised learning (CtL-SSL), uses the topological relationship of WDN and its leakage characteristics for WDN partition, sensors placement, and subsequently utilize the monitoring data for leakage detection and leakage localization. Compared with previous studies, this framework, 1) considers the spatial relationship of the sensors in WDN portioning and sensor placement, such relationship is the cornerstone for later leakage detection and localization; 2) does not require any historical leakage data for leakage detection; 3) can be used when only limited historical leakage data is available. More specifically, the leakage detection uses unsupervised learning algorithm to compress and decompress the normal water pressure data. This process performs poorly when the input is leakage data. The leakage localization uses supervised learning algorithm to extract the spatial relationship from the available leakage data. Only limited leakage data is required for each leakage zone with the help of proposed WDN partition process.
In previous studies, the leakage characteristic vector is simply determined by using the difference of pressure at monitored junctions before and after leakage at a given junction. Hence, the length of the vector equals to the number of sensors m. A novel leakage characteristic matrix was proposed in this study by using the PCA and AE algorithm to find the spatial relationship among the sensors, which extract the first k principal components, with k equals to m/2. The conventional leakage matrix is then projected to the k principal components. The resultant leakage characteristics vector achieved dimension reduction from m to k (or by half since k is set to be m/2). The leakage characteristic matrix are subsequently used for clustering of the WDN. Details about the conventional leakage characteristic matrix and proposed leakage characteristic matrix are given below.
Zhang and Chen defined the leakage characteristics matrix using the change of monitored water pressure due to a given leakage occurring at each junction compared with non-leaking conditions. Table 1 illustrates the calculation of the leakage characteristics matrix, where each row is the leakage characteristics vector corresponding to the junction.
where pinonleak is the water pressure measured by sensor i under non-leaking conditions. pijun j is the water pressure of sensor i when leaking occurs at junction j. i is the index for sensors which ranges from 1 to m, j is the index for junctions, which ranges from 1 to n.
The leakage characteristics matrix defined in Table 1 does not consider the internal relationships among the monitoring sensors. Previous studies have proven that such internal relationship is sensitive to leakage occurrence and therefore can be used for leakage detection and localization. Hereby, to further extract the underlying relationships between the junctions, this study proposed two new leakage characteristics extracted by unsupervised learning algorithms, i.e., the Principal Component Analysis (PCA) and Autoencoder neural network (AE). Details of the PCA-based and AE-based leakage characteristics matrix are described in the following.
PCA is an unsupervised ML model that is often used for the dimensional reduction of data samples. An example process to calculate PCA-based leakage characteristics is illustrated in
Due to dimension reduction by the PCA, for each of the n junctions, its leakage characteristics are represented by a projected vector with k elements. The dimension of principal components k is set to be around m/2, as this study found this well captures the relationship among the monitored junctions.
AE neural network is an unsupervised learning algorithm based on deep neural network. Unlike the PCA method, which is an orthogonal linear transformation, AE neural network can extract non-linear relationships among data samples. An example architecture of AE network that can be used to implement the systems and method described herein is shown in
AE-based leakage characteristics matrix is defined as illustrated in
Both the PCA- and AE-based leakage characteristics matrix are derived from the projected leaking data matrix by PCA or AE models which are pre-trained with non-leaking dataset only. This process effectively utilized these ML models for the feature extraction. Meanwhile, as a byproduct of the feature extraction process, the utilized characteristics matrix is only half-the-size of the monitored data size and conventional leakage characteristics matrix. Such data size reduction could potentially increase the computing and data storing efficiency as more data are collected by the sensors.
K-means clustering algorithm clusters data based on their Euclidian distance. The standard k-means algorithm has been used in previous studies for the WDN zone partition to reduce the degree of freedoms in leakage detection, based on the conventional leakage characteristics matrix. The partition aims to improve leakage detection and localization accuracy.
This existing WDN partition procedure has a few limitations. Firstly, the standard k-means algorithm requires the number of sensors and their placements to be predefined. For example, the monitoring data collection schema is set in advance and the standard algorithm cannot consider the influence of choosing different sensor placements during the clustering process. For example, Zhang used Zheng's algorithm to optimize sensor placements before initializing the WDN partitioning via the k-means clustering process. Secondly, the previous WDN partitioning (i.e.,) only considered the leakage characteristics. It did not consider the physical distance among the monitoring junctions. The consequence is the junctions clustered into the same WDN zone can be geographically scattered on the WDN.
To overcome these limitations, a modified k-means clustering algorithm is developed in this study for WDN partition. Compared to the standard k-means WDN clustering which only considers the leakage characteristics of junctions, the new algorithm also considers the shortest physical path distance between the junctions over the WDN. The pseudocode of the proposed k-means algorithm is shown in the following Table 2.
It is noted that in Step 3.1, the leakage characteristics matrix can be obtained by using different definitions, i.e., conventional leakage characteristic matrix based on pressure change or feature extraction with ML algorithms Although PCA and AE models are used in this study, other ML models can also be integrated into this framework, such as the Mahalanobis classification system (MCS). In Step 3.2, the physical distance between pairs of junctions is obtained by using Dijkstra's shortest pathfinding algorithm. Other shortest path algorithms could also be considered when dealing with different types of graphs, such as Floyd-Warshall algorithm. This step guarantees the clustered junctions are concentrated based on their network path distance. Both of the pair distance matrices are normalized by dividing their largest value. Therefore, the range of these distances is from 0 to 1. In Step 5, the represent distance between junctions is defined as the unweighted average of physical distance and leakage characteristics distance. The different ratios between the weight of leakage characteristics distance and physical distance will be discussed below. In Step 6, the process of centroid redistribution of each cluster requires the re-acquire of the leakage characteristics matrix with the new set of centroids. Also, in Step 6, unlike the standard k-means which used the mean value of each cluster as its centroids, the optimal junctions (that minimize distance within the cluster) is set as the new centroids so that centroids remain on the junctions in the WDN. The sensors are recommended to be placed at the centroid to maximize the value of data acquisition. Therefore, the influence of sensor placement is also considered during the WDN partition process.
The WDN partition stage clusters the WDN into partition zones based on leakage characteristics and physical connectivity. The monitoring sensors are recommended to be installed at the centroids of the partition zones to achieve the best value of monitoring data. With the partition zones and sensor data, Stage 2 implements algorithms for leakage detection and localization using the sensor monitoring data.
Two unsupervised ML models, PCA and AE models, are used for leakage detection. Both PCA and AE models are capable of extracting the most important features from the training dataset. Testing data are projected or decomposed into the dimension-reduced feature space; and from the projected components, the original data can be reconstructed with small errors. However, abnormal data that carries unknown features will lead to large reconstruction errors from the pre-trained ML models. This allows abnormal events such as leakages to be detected. The advantage of the proposed models is that they can be trained with unbalanced data (normal non-leaking data in the case of WDN) to detect abnormal conditions.
The ML model training process for leakage detection is illustrated in part of
The leakage localization is defined as a classification problem, i.e., the leakage conditions are classified into different WDN partition zones. There are various types of ML models for classification problems, such as the Artificial Neural Network, Support Vector Machine, Decision Tree, Random Forest (RF), etc. In this study, Random Forest (RF) is used because 1) RF is an efficient classification algorithm, and 2) it only needs a very few hyperparameters to be tuned. These help with the efficiency and consistency during the evaluation process. It is noted that the other types of ML-based classifiers can also be used for leakage localization. The RF is trained with leaking samples with leakage zone labels (step 7).
With the trained models for leakage detection (PCA and AE) and model for leakage localization (RF), for each operational dataset, the leak is detected based on reconstruction error larger than the threshold θ. If a leak is detected, the data will be fed into the RF classifier for leakage zone localization. The detection and localization process for real-time monitored data is illustrated in
The developed method for WDN leakage detection and localization can be readily applied to operational WDN. However, the monitoring data of in-service WDN is scarce. A hydraulic simulator of WDN is therefore utilized to generate a dataset to evaluate the developed framework. Simulation-based data generation is commonly used to develop ML models to overcome the limitations of physical data. For example, Tao evaluated an artificial immune network with the dataset generated by water pipeline hydraulic simulation code EPANET. Similar works have also been done by many studies. In this study, a python package WNTR is utilized to build the hydraulic model for WDN. The package implements the hydraulic model and solver of EPANET 2.2, which is an industrial hydraulic standard. It is also capable of performing Monte Carlo simulations of WDN operations under different scenarios.
By default, the hydraulic simulator considers the user node could always get designed water demand (d) even when the water pressure at that node is 0. However, due to the leakages, the supplied water demand (d*) could be less than the designed water demand (d) when the water pressure is low. Herein, a pressure-dependent water model is used to consider the influence of water pressure on the water supply at each junction, which is assumed to follow Wagner's formulas as shown in Eq. (1):
where p is the water pressure at the junction, d is the designed water demand, d* is the supplied water at different water pressure. P0 is the minimum water pressure, Pf is the required water pressure to meet the designed water demand. η is the pressure exponent which is set as 2 in this study. The values of P0 and Pf are set as 2 and 30 m respectively based on
recommendation by [42].
The equation by Crowl and Louvar is used as the leaking model. The model assumes there is a turbulent flow of water as leak occurs. The mass flow rate of the leakage is expressed by Eq. (2):
dleak=CdA √{square root over (2gh)} (2)
where i the leaking demand which depends on the water pressure. Cd is the discharge coefficient which is set as 0.75 in this study. A is the leaking area in the unit of m2, h is the water head with unit of m. g is the gravity acceleration (m/s2). To emulate the uncertainty of leakage size, a randomly generated value of the leaking area A is used in simulating different leaking scenarios.
The following procedures are used to generate dataset under normal (non-leaking) conditions and leaking conditions:
Demandi=Dbasei+N(0σi2)| (3)
where Dbasei is the baseline design water demand at junction i which is defined in the original pipe network. A Gaussian term is added to consider the water usage fluctuations.
Steps 2 to 3 are repeated to generate data under different water demand conditions.
Similar procedures are used to generate a dataset for WDN under the leaking scenario (Steps 1-3). Except for the effects of leakages are considered in Step 3 before solving the hydraulic model for the WDN. Leaking is assumed to occur at each junction, which is convenient for clustering purposes. The leakage size is assumed to be randomly set between 0.05 meter to 0.1 m in this study, which leads to 0.0012 m2 to 0.0078 m2 leaking size. These, however, can be easily changed to more complex leaking conditions.
C-Town water distribution network is a WDN that was used for calibration competition in Battle of the Water Calibration Networks (BWCN). The topology of this WDN is shown in
Although the parameters of the C-Town WDN provided by the original paper are deterministic values, the uncertainties of the WDN are considered in this study by adding randomness to the parameters. For example, a Gaussian distributed random value (Eq. 3) was added to the water demand of each junction to represent the uncertainties of water demand, the standard deviation is 10% of the junction's designed water demand The leakage size of each leakage scenario was chosen from uniform distribution of 0.05 m to 0.1 m. Besides, to consider the sensor noise, a random error of Gaussian distribution is also added to the water pressure data, which has a 0 mean value and 0.1 m standard deviation. Using the EPANET model for the C-Town WDN with proper hydraulic boundary conditions, simulations are conducted on the WDN under different operational conditions (i.e., the operational rules of the pumps and valves, water demand, leakage occurrence, etc.). From these, the hydraulic data (i.e., water head and flow rate) at any location in the WDN can be obtained.
The C-town WDN is partitioned following the procedures described in the section entitled WDN partition stage: modified k-means clustering algorithm. Datasets of C-town WDN were generated using the python package WNTR for both non-leaking conditions and leaking conditions via the procedures described in the section entitled WDN Operation Data Generation.
To calculate the leakage characteristic matrix, without loss of generality, a fixed leakage size of 0.05 m was assumed in the simulation, since the subsequent data normalization will take away the effects of leakage size. A leakage matrix is essential to obtain the leakage characteristics matrix for WDN partitioning. The leakage matrix is used to obtain the influence of leakage at different junctions on the monitoring locations. A fixed leakage size of 0.05 m was used to build the leakage matrix for the simplify consideration. This leakage size is selected based on the lower bound of leakage size. The effects of selected leakage size are minor since the leakage matrix be normalized to determine the leakage characteristics matrix.
To evaluate the relative performance of the proposed partitioning method, the testbed C-town WDN were partitioned into different numbers of leakage zones based on 5 different partitioning methods, which utilizes different data feature versus Euclid distance measures. These comparative approaches are described as following.
As shown in
The leakage detection is demonstrated on the clustering result when the C-town WDN is partitioned into 10 clusters by PCA-based partition. The monitoring sensors are assumed to be installed at the centroid of each cluster and the corresponding data are used (shown in
With the data generation procedures outlined in the section entitled WDN Operation Data Generation, the dataset with 1000 non-leaking samples under different operation conditions of the WDN is generated. The non-leaking data are randomly split into a subset of 700 and 300 samples. Then, 300 leaking samples were generated by setting a random leakage size at a randomly picked junction. The subset of 700 non-leaking samples is used as the training dataset. The subset of 300 non-leaking samples together with the 300 leaking samples is used as the testing dataset.
The ML-based leakage detectors (AE model or PCA model) are firstly individually trained with the training dataset (700 non-leaking samples). With the trained ML models, the testing datasets are fed as inputs. The reconstruction errors of the input data by the AE detector and PCA detector are shown in
If the reconstruction error of a non-leaking sample is smaller than the threshold or a leaking sample is larger than the threshold, this sample will be recognized as correctly classified. Misclassifications happen with the set threshold, i.e., leaking samples are classified as non-leaking, or non-leaking samples are classified as leaking. The leakage detection accuracy is assessed by the number of correctly classified cases over the total number of testing samples.
Besides detecting leakage, localizing the leakage is also important for retrofit actions on the WDN. Conventional supervised ML classifier requires training dataset must include data of leaking occurring at each WDN junction. In practice, however, leakage only appears at limited locations, which makes it infeasible to well train a supervised ML model. With the partition of WDN, the leakage localization problem is defined as a semi-supervised classification problem. The Random Forest (RF) model is chosen for leakage zone localization following the procedures outlined in
The leakage localization performance of using different partition methods is firstly evaluated in WDN by partitioning the WDN into 6 leakage zones. A small portion of total junctions (assumed as 10% in this study which can be changed to other assumptions without loss of generality) are assumed to have experienced leakage. The leaking junctions are assumed to be evenly distributed among the 6 partition zones. Based on this assumption, leaking data samples are generated by assuming a leakage of random size occurring at one of these selected junctions. For each of the 6 partition zones, 400 leaking data samples are generated. Therefore, the total training dataset includes 2400 data samples, with their respective labels of partition zones they belong to.
For the testing data, 200 leaking data samples are randomly generated assuming leakage of random size occurring at randomly picked junctions in the remaining 90% junctions. Altogether, the testing dataset includes 1200 leaking samples.
A confusion matrix is often used to evaluate the classification performance, which is also used for leakage localization performance in this study. A typical confusion matrix contains four prediction terminologies: True Positive, True Negative, True Positive, and False Negative. For a multi-classification confusion matrix with a structure like
where M is the confusion matrix, i is the ith class, k is the total number of classes. Mij indicates the ith row and jth column.
The RF leakage zone detection is implemented on WDN using different partition methods, i.e., AE-based partition, PCA-based partition, Modified Partition, and Graph distance-based partition. The confusion matrices of the final leakage localization results are shown in
The analyses so far indicate that the AE model achieved higher accuracy than the PCA model for leak detection, possibly due to its ability to extract non-linear relationships among the input features. Meanwhile, the RF-based leakage localization achieved the highest accuracy when using PCA-based WDN partition, possibly because the PCA-based information extraction is easier to be learned by RF. PCA also features higher computing efficiency. Therefore, a hybrid framework is proposed that combines the use of PCA-based partition, AE-based leakage detection, and RF-based leakage localization.
In practice, resource constraints might prevent sensors to be installed at the most optimal junctions. To consider such issue, analyses are conducted under the scenery where sensors are assumed to be ‘randomly placed’ in the WDN and the leakage zones are clustered using these sensors as the centroids. The accuracy of leakage detection and localization under ‘optimal sensor placement’ and ‘random sensor placement’ are determined using the C-Town WDN testbed. The final results are summarized in
Initially, the proposed framework is illustrated by defining distance Lv,c
L
v,c
=w
1
*L
(v,c
)
leakage
+w
2
*L
(v,c
)
physical (4)
s.t. w1|w2=1
where w1 is the weight assigned to the leakage characteristics distance and w2 is the weight assigned to the physical distance.
Sensitivity analyses are conducted on the effects of weights assigned to the leakage characteristic distance. The final leakage localization accuracy when using different values of w1 is shown in
To further illustrate the proposed leakage partition, detection, and localization framework, another WDN, Rancho Solano Zone III WDN is used as the second independent testbed. This testbed is located in Fairfield, California. The information about this WDN is published by ASCE task committee on a research database for water distribution systems and are open to download from the database of Kentucky University. The graph of this water supply network is shown in
The same uncertainties of water demand and sensor noise that are considered in the Case study I are used again in this testbed. The leakage uncertainty range is set as U(0.01, 0.05) since a too large leakage size could directly drainage all the water in this WDN. A 40 time steps water pressure record of Junction ‘F010’ is shown in
The proposed hybrid framework, as described in section entitled Hybrid approach for leakage detection and leakage zone localization, is applied on the Rancho Solano Zone III to illustrate the effect of WDN partition results. The considered leakage zone numbers are 2, 4, and 6 in this study. When using an equivalent penalty weight of the leakage characteristics and physical distance, the final partition results when considering different numbers (k) of partition zones are shown in
As can be seen from the results, the partitioned results are reasonably balanced and concentrated. Hence the maintenance team can easily narrow down the inspection area after a leakage is detected.
Similar to Case study I, only non-leaking monitored water pressure data is used for leakage detection and only 10% of junctions of each leakage zone are assumed to have leakage experience. Hence the recorded water pressure under available leakage experiences can be used for the leakage localization model training. A similar preparing process for the training dataset and testing dataset in Case study I is also used here.
A novel CtL-SSL framework is developed for WDN leakage management in this study. The framework includes WDN leakage zone partition, leakage detection, and leakage zone localization. The WDN partition is based on the leaking behaviors of the WDN junctions. New leakage characteristics are defined based on features extracted from non-leaking data with unsupervised ML models such as PCA or AE. Improved k-means method is proposed for WDN partition, which considers the graph distance between junctions and the leakage characteristics. Sensors are recommended to be installed at the centroid junction of each partition to acquire monitoring data. With the monitoring data, unsupervised ML models are developed for leakage detection based on threshold criteria of reconstruction errors. This allows leakages to be detected with unbalanced dataset that contains non-leaking samples only. With the leakage zone partition of the WDN, the leakage zone localization is defined as a ML-based classification problem using partition zone numbering, which is achieved with a small percentage of leaking data.
The results indicate the new partition algorithm (stage 1) achieves less intermingling of junctions from different partitions compared with the conventional partition method. The leakage detection and localization stage (stage 2) also gained promising performance even with leakage data over only a small portion of junctions. The proposed framework achieved around 95% accuracy in leakage detection and 83% leakage localization accuracy in both case studies with less than 10% of junctions' leakage data.
The proposed CtL-SSL framework can be easily used on different WDNs and updated with more powerful models in the future, which increases its extensibility and upgradeability. The final performance may vary when the number of leakage zones and scales of WDNs are different. Determining the optimal number of leakage zones for different types of WDN is still a problem worth future investigation. In practice, an optimal number of leakage zones should not only consider the final detection and localization accuracy but also factors such as budget limitation, expected leakage zone resolution, social-economic impact, and so on. Moreover, the systems and method described herein can be developed and validated by use of data generated by use of hydraulic model for WDNs.
Methods to quantify the WDN resilience have been an active area of research in recent decades. These methods can be divided into two major categories, i.e., surrogation-based quantification and recovery-based quantification. The surrogation-based resilience quantification treats the WDNs as static systems (typically before disruption) without considering their time-dependent performance after the disruption. Energy-based and graph-based analyses of WDN are two of the most commonly used surrogation-based methods for resilience quantification. For example, Todini proposed a WDN resilience quantification method based on energy dissipation during the water distribution process. Prasad et al. further enhanced Todini's method by considering the redundancy of the network. Jayaram et al. proposed another surrogation index for WDN resilience quantification by considering multiple water supply sources in one network. Zarghami et al. quantified the water distribution network resilience based on the betweenness centrality and information entropy theory.
Given the important role of recovery decisions on WDN resilience, different models have been proposed in previous research to improve decisions on system restoration sequence. However, due to possible large number of failures and complex hydraulic relationships in a WDN, determining an optimal restoration sequence to maximize system resilience remains a challenging problem. Current methods for optimal WDN system restoration decisions can be categorized into two major categories, i.e., recovery methods based on the static importance of WDN components and resilience-oriented recovery methods.
The first group of methods ranks the WDN components based on a static measurement of their importance in the WDN. The common importance indicators are based on the graph theory such as the betweenness centrality of the nodes or operational energy such as the distance to the source, or pipe entropy values. Overall, this group of methods does not require evaluating the dynamic recovery process. On the other hand, the resilience-oriented methods, typically involve a hydraulic model for the WDN and require defining an optimization problem to assess the efficiency of the recovery process which are solved with dynamic optimization algorithms Based on the computing algorithms, the dynamic optimization algorithms are further categorized into local optimization methods and global optimization methods. Local optimization methods are based on exhaustive search that only achieve local optimal. For example, Liu et al. compared static importance-based pipe recovery method with dynamic importance-based pipe recovery method. The global optimization algorithms are aimed to achieve a global optimal solution. The widely used conventional methods include Genetic Algorithm (GA), fuzzy logic model, and Bayesian network. These algorithms, however, are computationally intensive and can lead to unstable solutions. For example, Liu et al. denoted that the GA method underperformed the local optimization methods when using a random initial population.
This description further provides a hybrid machine learning (ML) model programmed to determine the optimal WDN restoration sequence post-earthquake. The ML model, referred to herein as GCN-DRL model, integrates Deep Reinforcement Learning (DRL) and Graph Convolutional Neural network (GCN). The GCN-DRL model utilizes GCN to encode the topological information of WDN and uses the DRL framework to identify the optimal restoration sequence to maximize the SRI during the recovery process. The proposed method belongs to the resilience-oriented global optimization methods. Low computing efficiency is a general barrier of global optimization methods. The proposed method overcomes this limitation by transfer learning strategy, where the pre-trained GCN-DRL model is used to save the training needs and reuse for new disasters. This therefore achieves high computing efficiency for long-term recovery management by experience accumulation.
The following section of this description is organized into parts that describe the following features: 1) the dynamic demand-based seismic resilience evaluation framework, which consists of a model for pipe failure prediction, a model for WDN performance measurement, and a model for WDN resilience quantification; 2) on the background of Deep Reinforcement Learning (DRL) and Graph Convolutional Neural network (GCN). This is followed by the description of the detailed architecture of the developed GCN-DRL Hybrid ML model; and 3) describes the application and performance of the proposed ML model for a widely used WDN testbed.
Additionally, the concept of time-dependent system performance degree (PDW(t)) post hazards is utilized to measure the system resilience by the system resilience index (SRI) (see, e.g.,
The system recovery stage includes the effects of repairing damaged pipes on the WDN performance The leakages are removed for the selected pipe and the repairing time is recorded based on the number of leakages. Since a dynamic changing water demand is considered, the nodes' water demand is changed before conducting the hydraulic simulation (section entitled Quantify the WDN System Performance Degree Considering User Demand-Change and Node Importance during Post-Earthquake Recovery Process). In the end, the WDN performance (PDW) of the next time step can be determined by section entitled WDN system resilience measurement based on WDN recovery process. The repairing process is repeated until all the failure pipes are repaired. The final system resilience index (SRI) is determined (see Eq. 10). A novel GCN-DRL ML model are evaluated for recovery decisions, which is also compared with four conventional methods used for repairment decisions (pipe repair sequence).
Various components of WDN, including pipes, tanks, pumps, and water treatment facilities, could all be subjected to different extents of damages by earthquakes. To simplify the analyses without the loss of generality, this paper focuses on the repair sequence of distributed components, i.e., pipelines. The localized facilities (i.e., tanks, pumps and water treatment facilities) are not considered in the analyses. The accepted relationship between peak ground velocity (PGV) and pipe repair rate is used to describe the pipe fragility curve. The model was originally proposed by ALA (2001) guideline and further improved by Mazumder et al. with the consideration of pipe deterioration conditions. For the PGV estimation of an earthquake event, an empirical equation, Eq. (1), proposed by Yu and Jim is adopted in this paper, since it is developed with a dataset collected at a similar location to the testbed WDN of this study.
PGV=1.0−0.848+0.775M+1.83+log(R÷17) 1
where R is the distance from the epicenter (km) and M is the magnitude of the earthquake.
With the information of PGV, the pipe failure probability with the consideration of deterioration is written as (Mazumder et al.):
Pf=1−e−k
where Pf is the pipe failure probability every 1,000 feet (304.8 m). k1 and kc are the correction factors that consider the effects of pipe material, size, soil type, and age. Examples of recommended values of k1, kc for different pipes are provided in Mazumder, R. K., et al., Framework for seismic damage and renewal cost analysis of buried water pipelines. Journal of Pipeline Systems Engineering and Practice, 2020. 11(4): p. 04020038.
Multiple damages along a single pipe are considered in this study. The probability of the damage number of a pipe is assumed to be Poisson distributed, which is mathematically described as following.
where P(m) is the probability of m damages occurring in the pipe; λ is the parameter of Poisson's distribution; L is the total length of the pipe, L0 is a reference length of 1000 ft (304.8 m) (therefore m defines the damage number of every 1000 ft (304.8 m)).
The parameter λ of Poisson's distribution in Eq. (3) can be estimated based on the probability where no-failure occurs on the pipe by using Eqs. 4 and 5.
For each seismic hazard consequence simulation, the number of failures along each pipe is randomly sampled with the corresponding Poisson distribution (Eq. (5)). The position of the damages is assumed to occur at random locations along a pipe. The effects of seismic damages on the operation of WDN are simulated by assuming that the damages will cause leakages in the pipelines. In principle, the leaking sizes vary with the extent of damages to the pipes. For simplicity, the damages are simulated as leaks with the size of 25% of the pipe cross-section area in this study. This, however, can be easily extended when more accurate information are available for a specific WDN. The seismic failure assessment of the water pipe network is coded by Python scripts. The WDN under normal operation and post-earthquake failure conditions are simulated by a hydraulic simulation solver WNTR. WNTR is an open-source python package for hydraulic simulations of the water pipe system, which solves the same sets of equations as the widely used EPANET 2.2.
The performance of the WDN is measured by its capability to meet the customers' water use demands Given the essential role of clean water supply to public life, it should also be one of the most important criteria for post-hazard restoration decisions. The water user nodes satisfaction degree (NSD) is used to quantify the performance of the system in this study. The NSD is defined as a ratio of the expected water use at the node and actual water supplied to the node. The water demand of each node has been found to experience a dynamic change process post-earthquake. Didier et al. studied the post-earthquake water demand behaviors after the 2015 Gorkha Earthquake. The results showed that the expected water demand decreased significantly due to damages to buildings and equipment when subjected to a high level of damages. For simplicity, a quadratic model is used to describe the time evolution trends of water demand post-earthquake, i.e., a disruption and then recovery process.
where Di0 is the expected water demand before the earthquake, Nrepair is the number of repaired pipes at time t; Ntotal the is the total number of pipe failures due to the earthquake; t is the recovery time. t=0 indicates recovery begins. The initial post-earthquake water demand at each node is assumed to be the expected water demand (Di0) multiplied by a small value of 0.0001. Multiplying with a small number rather than 0 is used to avoid division by 0 when determining NSD (Eq. 8). After that, the expected water demand of each node is assumed to increase gradually with the WDN recovery process until full water demand is restored.
The real-time water supply to each node on the WDN is determined by expected water demand and the actual water pressure. The relationship between real-time water supply (Di) and the expected water demand (Di0) is shown in Eq. (7).
where pi is the actual water pressure at the node, the p0 is the predefined lower bound of water pressure (under which no water is supplied); pu is the upper bound of water pressure (the minimum pressure to ensure water supply to meet the design water demand)
According to Eq. (7), when the water pressure at a node is lower than p0, no water is supplied to the node. If the water pressure at a node is higher than pu, the design water demand of this node will be met. When water pressure is between p0 and pu, the real water demand is dependent upon the water pressure. p0 and pf are set as 0 and 30 meters as recommended by Zhou et. al.
The Node Satisfaction Degree (NSD) in this study is defined as the ratio between the real-time water supply and the post-earthquake expect water demand at a given time during the restoration process. NSD value larger than 1 is assumed to be 1 (or water demand at the node is fully met). The NSD is defined as follows:
where Di(t) is the actual water supply to the node at t; Di0(t) is the expected post-earthquake water demand at t. The units of both two variables are flow rate (m3/s).
Based on the NSD defined for each node, the overall degree of performance of a complete WDN are defined as the performance degree of the water network (PDW), which is calculated as the weighted sum of the NSD at each node in the WDN (Eq. (9)). The weight factor considers the relative importance of the node. Using NSD to measure the overall WDN performance allows considering the importance of critical water supply nodes by assigning appropriate weight to the nodes (i.e., Eq. (9)). For example, restoring water supply to critical facilities such as hospitals, firefighting stations, schools, etc. is more critical than less safety-critical facilities. The important nodes can be prioritized in the restoration plan by assigning proper weights to the NSD, which can be considered for the seismic consequence analysis.
where the wi is the weight factors that consider the relative importance of the nodes; NSDi(t) is the node satisfactory degree at time t which belongs to (0, 1]. The weight of a node wi should be subjected to Σi=1nwi=1. Therefore, the PDW at any time t falls within the range [0, 1].
As some prior studies indicated, the weight or importance of a water supply node may also change during the restoration process. A detailed method to quantify the importance of different nodes is out of the scope of this study. A pre-defined fixed weight for each water supply node is used in this paper. Dynamic changing of node importance can be considered by using the proposed framework, which is similar to the consideration of the dynamic changing of user's expected water demands.
Based on the definition of the time-dependent system performance degree, PWD(t), (Eq. (9)), the system resilience index (SRI) during the recovery process is defined using the area of under curve of PWD(t) (
where tend is the time of ending recovery; t0 is the time of beginning recovery; The integration is normalized by (tend−t0) to consider the effects of recovery time.
Based on the definition of the System Resilience Index (SRI), the larger the SRI, the faster the WDN recovers or the more resilient the WDN system is. Therefore, the objective of the optimal WDN recovery problem can be defined as finding the repairing sequence, which consists of repair actions which archives the highest SRI over the recovery process. Mathematically, the problem of optimal repair sequence can be defined in Eqs. 11 to 13. Eq. 11 defines the main objective function for optimization, which aims to maximize SRI. SRI is the resilience evaluation of the WDN recovery process, which is affected by a sequence of repairing actions ai. Eq. 12 and Eq. 13 defines the constraints on the repairing actions. In this study, it is assumed that a) pipes are repaired once a time and for each time, and b) full repair of the WDN refers to the condition that the set of pipes repaired equals to the set of pipes failed. The optimization problem aims to determine an optimal repair sequence to achieve maximum SRI.
max[SRI (a0, a1a2, . . . , an)] 11
s.t.
a1≠a2≠a3. . . ≠an 12
a1∪a2∪a3, . . . ∪an=K 13
where SRI(⋅) is the resilience of WDN with the given repairing sequence; ai is the repair action at i step; K is the set of failed pipes due to the hazards.
To focus on the key problem without loss of generality, the following assumptions are made in developing a decision support model for the optimal repair sequence to restore the WDN service.
Repair time for pipe damages: Different pipes may experience different repairing time. For example, the Federal Emergency Management Agency provided the estimated repairing time of different components. To simplify the analyses in this paper, it is assumed an equal amount of time is needed to fix any damages in a pipe. This means the repairing time for a single pipe is only determined by the total number of damages along this pipe.
Binary working status of damaged pipes: The typical pipe repairing process is normally conducted by closing the pipe end connections. A damaged pipe is re-open only when all repairs on this pipe are finished. A damaged pipe is assumed to be either closed or open based on the status of the repair. This simplifies the hydraulic model of the WDN.
Resource for repair: A single repair team is assumed, i.e., the WDN is repaired with one repairing team with no resource limits. This is also a common assumption used in prior research in determining the optimal recovery sequence of WDN (i.e., [7, 28]). This assumption ensures the failed pipes are recovered one by one.
Non preemptive recovery: It is assumed that the repairing team has to finish the repairing work on the current pipe before moving to repair the next pipe. This assumption is often used in analyzing infrastructure repair processes such as roads, bridges, and power grids, etc.
Extent of damages. There are variable extents of damages to pipes during an earthquake. To simplify the analyses, this study assumed that the leaking size is 25% of the pipe cross-section area, which is to the extent of large average damages. Similar assumption is also adopted by Shi et al.
Dynamic changing post-earthquake water demands: The water demand at each node of WDN is assumed to restore to pre-hazard condition as the restoration of WDN continues. A single post-hazard dynamic water demand process is used here. It is noted that different nodes could experience different dynamic water demand restoration process in the service WDN depending upon the function and location of the nodes. However, multiple dynamic water demand patterns can be easily added to the model when such data is available.
Deep reinforcement learning (DRL): DRL is an impactful development in Machine Learning (ML) model. It provides a powerful new approach to solve optimization problems based on a series of actions. DRL achieves promising results to identify the optimal action sequence from a massive set of action spaces and based on the corresponding system states and interactions with environment. Andriotis et al. provided a detailed introduction about the successful DRL applications in the management system. DRL has also been successfully applied in areas such as communications, robotics, and biology, which have proven the ability of DRL for global optimization problems with high efficiency. For the WDN restoration, the problem of optimal repair sequence is a global optimization problem. The action space is the set of damaged water pipes during earthquake. The system state is represented by the node satisfactory degree (NSD) and the WDN structure during the restoration process.
The RL model not only considers the instant reward of each action but also considers its potential influence in the future. Therefore, rather than choosing the action with the highest instant reward, a Q value is given to each action to determine which decision should be made. Such a Q value is defined as the Bellman equation as shown in Eq. 14. The Q value integrates the action's instant reward and the max Q value of the next state after taking this action. As demonstrated by Mnih et al., by iteratively sampling all the actions under all the states, the RL model will compile the Q values of each action under each state to get a Q table. Then, the RL model could simply determine the most optimal action by choosing the action with the highest Q value.
where the denotes the expectation, r is the immediate reward after taking action a, and γ is the return discount for future rewards by following optimal policy of next state si+1.
Graph convolutional neural network (GCN): The graph convolutional neural network (GCN) was firstly proposed by Lecunn et al. as inspired by the motivation of the convolutional neural network (CNN). Graph neural network is a special neural network that can directly operate on graphic structural data. The GCN utilized the key ideas of a CNN, such as local connection, shared weights, and the use of multi-layers. It, however, convolves the neighborhoods feature to the latex space, which overcame the limitation of CNN that can only perform on regular Euclidean data such as image (2D) and text (1D). GCN has been successfully applied in domains such as physics, chemistry and biology, knowledge graph, etc. Zhou et al. provided a detailed review of the GCN and its various applications. In the civil engineering area, GCN has been applied for traffic flow prediction by treating the traffic network as a special type of graph. The successful applications of GCN in different domains have proven the potentials of GCN in understanding the complex relationships in a graph structure.
The characteristics of graphs make it adapted to describing the complex internal relationships among nodes in civil infrastructure networks. For the WDNs, the topological structure of the water pipe network can be described as a graph, with nodes represented as vertices and pipes represented as edges. The vertices and edges contain information of WDN service conditions such as the water pressure, flow rate, node connection, pipe length, and so on. Another advantage of using graph data structure for WDN is that the procedures of WDN structure and data representation can be universally applied to all types of WDN.
In this paper, the graph convolutional neural network (GCN) implemented by Kipf et al. is utilized for WDN network analysis. The layer of GCN performs a convolutional process on a graph-structured dataset. Unlike the traditional 2-dimensional convolutional process of CNN which focused on extracting the feature via a selected convolution filter, the GCN layer conducts the feature extraction of each vertex and its neighbors. Therefore, the structure of the graph is considered. Mathematically, a graph convolutional layer in GCN will project the nodes of the WDN network into a latex space by using Eq. 16.
where
The input to the GCN is the feature matrix of the graph, X, whose dimension is N×D, N is the number of nodes, D is the number of features of each node. In this study, one feature is used for node attribute which is the node satisfactory degree (NSD) defined in Eq. 8. The output of each GCN layer is N×P, where P is the predefined dimension of latex space.
A hybrid ML model is proposed in this study to optimize WDN recovery by combining the GCN and DRL. This study is the first attempt to utilize Graph Convolutional Network (GCN) to extract information from the graph structure and analyze the water distribution network recovery.
An architecture of the proposed GCN-DRL Hybrid ML model (short as GCN-DRL), which can be used by the systems and methods described herein, is shown in
where PDWi(t) is the degree of performance of WDN after taking repair action i; PDW(t−1) is the degree of performance of WDN after previous repair action; Ti is the duration of repairing pipe i. This study assumes the repairing time is determined by the number of damages of the fixed pipe.
The deep Q function which integrates the deep GCN is shown on the right side of
However, for most real-world problems, the aforementioned Q table is extremely hard to obtain due to the infinity number of combinations of s and a. To overcome this challenge, DRL is proposed. The DRL intends to leverage the advancement in deep learning to solve the traditional RL problem, which contains a large set of status space and an action space. Rather than using a Q table, a deep Q function is utilized to estimate the Q value of each action under different states, as shown in
The proposed GCN-DRL Hybrid ML model is used to determine the pipe repair sequence. To achieve a smooth and stable training result, the technique ‘Experience replay’ described by Mnih et al. is also used in this study. The graph neural network and reinforcement learning used in this study is performed by the python deep graph library and PyTorch library.
The GCN-DRL based repair decision-making model based on the recovery-based WDN seismic resilience evaluation framework is applied to analyze the seismic recovery of a testbed WDN located in Fairfield, California. The complete dataset about this WDN is publicly available from the database maintained by the University of Kentucky. The original water demand and water supply conditions are used in the study. The influence of pipe ages, materials, costumer importance, and soil types is also considered in this case. The detailed information about the testbed is summarized in Table 3.
The proposed GCN-DRL model is trained to repair the damaged pipes in the WDN. Table 2 shows the key parameters used in training the GCN-DRL mode. The training episode is set as 500. Since 44 pipes are damage, this means the deep Q function is trained 22,000 times. The parameter epsilon, which determines if repair is by random decision or by RL learning, started with 1 and continues to decrease to a small value with progresses in WDN repairment. The Edecay is set as 5000 so the epsilon value could be nearly 0 at the end of training (0.0122).
The performance of the repair sequence by developed GCN-DRL ML model is compared with conventional decision-making methods. Four conventional decision-making methods for WDN repair sequence, i.e., static importance-based method (S2), dynamic importance-based method (S3), genetic algorithm-based method (S4), and random repairing method (S5) are chosen as comparison basis under the same damage scenario (
S2: static importance-based method. This method prioritizes pipe repair based on ranking the improvements of the WDN performance degree (PDW) after repairing the pipe over the initial damaged status. The larger the ranking factor, the higher the priority the pipe to be fixed. The ranking factor of pipe i is defined as:
where PDWi is the performance degree of WDN after repairing pipe i; PDW0 is the performance degree of WDN before any recovery; Ti is the repairing time for pipe i, which equals the number of damages on the pipe.
S3: dynamic importance-based method. This method determines the pipe repair priority by the dynamic importance during the recovery of the WDN. Unlike S2 which only compares the performance improvement with the initial damage status, S3 compares the performance between the pipe recovery and current WDN status by the following equation. The importance of pipe i is ranked based on Id,i(t),
where PDWi(t) is the performance degree of WDN at time t after repairing pipe i; PDW(t−1) is the performance degree of WDN before at the last time step; Ti is the repairing time for pipe i, which equals its damage number.
S4 Genetic algorithm-based method. Genetic Algorithm (GA) is a widely used global optimization algorithm. As a combinatorial optimization problem, special crossover and mutation methods are adopted in this study. The application of GA in this paper is summarized as follows.
S5 random repairing method. This method chooses a random repairing sequence.
The computing performance of each method is evaluated by the final SRI value of the recovery trajectory, the recovery time to achieve satisfactory level of system performance, and the computational time.
Examples of recovery trajectories by using methods from S1 to S5 are shown in
The recovery time to achieve a minimal satisfactory level of system performance is critical for infrastructure restoration.
The computational time to determine the repair sequence by methods S1 to S5 are compared in Table 3. The GCN-DRL ML model takes more computational time than S2, S3, and S5 since a large number of training iterations are involved. For example, in this case, the GCN-DRL model takes 500 training episodes, each training episode contains 44 times of repairing process (44 damaged pipes). Therefore, 22,000 hydraulic simulations were conducted to capture the WDN performance The deep Q function was also trained 22,000 times. However, it is noted that the computational time is based on a model that is trained from scratch. The trained GCN-DRL model can be utilized by transfer learning, which can significantly reduce the computational time needed to generate decisions when new disasters occur.
Traditional resilience-oriented methods have to start from scratch for each new failure situations, which compromises its computational efficiency. Quick response to damages by different new disasters are desirable to decrease the economic loss and benefit global restoration plan. While the GCN-DRL achieved decision sequence that ensures fast system recovery, training the model from scratch requires relatively long computational time. To reduce the computational time, a novel transfer learning strategy is explored for the GCN-DRL for new disaster scenarios. That is, when training the GCN-DRL model, the parameters of the deep Q function such as the neuron weights and biases can be saved as the ‘training experience’. Hence, unlike conventional decision algorithms that need to start from scratch for each damage scenarios, the GCN-DRL model can use the ‘training experience’ from previous training result as long as the new damaged pipes have been considered. Consequently, a high computational efficiency is achieved, which is advantageous especially for a long-term management strategy.
To demonstrate the benefits of transfer learning, the performance of the GCN-DRL model and computational time based on transfer learning for new damage scenarios is compared with those by conventional methods. The new damages are randomly chosen from a subset of the predicted pipe damages (
Table 5 summarizes the performance as well as the corresponding computational time to determine the repair sequence by different methods on the new damage scenarios.
In terms of performance in ensuring WDN system resilience, the GCN-DRL model (S1) with transfer learning achieved the highest SRI value among all the methods. The SRI value of the repair sequence by S1 is larger by 1.16, 0.252, 1.706, and 8.911 than that of the rest four repair methods respectively under the situation of 36 damages. The SRI value based on repair decision by S1 only improved 0.034, 0.003, 1.386, and 9.939 for the scenario with 24 damaged pipes. The comparison indicates that the larger the number of pipe damaged, the more advantages of GCN-DRL in achieving an optimal decision sequence than conventional methods. This is reasonable since the larger the number of pipes damaged, the more difficult it takes to identify the global optimal with conventional methods. This is an indication of the powerfulness of GCN-DRL model in making global optimal decisions among a large decision space.
In terms of the computational time for decisions, the use of transfer learning significantly reduced the time needed for the GCN-DRL model to determine the optimal repair sequence. The computational time is comparable to those needed by conventional methods. It is noted that the GCN-DRL model significantly outperformed the GA method, a global optimization method, in terms of both performance and computational efficiency.
Optimal repair decisions play an important role in improving WDN resilience by accelerating post-disasters recovery. To improve system resilience by optimizing the repairing decisions, this study proposed a WDN seismic resilience evaluation framework and a novel decision-making model for resilience-oriented restoration plan. The resilience evaluation framework consists of a model for pipe failure prediction, a model for WDN performance measurement, and a model for WDN resilience quantification. The system resilience index (SRI) is proposed for the system resilience quantification, which is defined on the time evolution of WDN system performance degree (PDW) during the recovery process. The PDW considers the node satisfaction degrees (NSDs), which measure the extent of the dynamic water demands post-hazards at nodes of the WDN are met, weighted by the relative importance of the nodes. With the system resilience indicator SRI, a novel Graph Convolutional Network (GCN) and Deep Reinforcement Learning (DRL) hybrid machine learning model is developed to determine the optimal repairing decision. The GCN-DRL model combines the advantages of DRL and GCN. The GCN is used to embed the WDN including the topological connections and information of each node. The DRL framework is used to train the GCN to determine the optimal repair actions under any given damage situations.
The GCN-DRL model is demonstrated to determine the optimal repair sequence of a testbed WDN subjected to earthquake damages. The damage scenarios are determined with considerations of the magnitude of the earthquake, distance to the epicenter, soil type, pipe deterioration, etc. The performance of the pipe repair sequence by the GCN-DRL model is compared with the results by four traditional decision-making methods. The results show that the GCN-DRL model consistently achieves repairing sequences that lead to highest system resilience index (SRI) under different damage scenarios. Besides, transfer learning strategy can be used to train the GCN-DRL model for new damage scenarios by taking the advantage of the previous training experience, which significantly improved the computational efficiency. The transfer learning strategy was demonstrated on three new damage situations of the WDN. The results show that the transfer learning of GCN-DRL decision making model achieved the most resilient WDN recovery with significantly reduced computational time. Therefore, the new GCN-DRL model is promising to be a high-performance robust decision-support tool for post-hazard repairing decisions to ensure resilient WDN recovery.
The following portion of this description aims to fully leverage the potential of real-time data acquisition systems combined with advanced machine learning (ML) to achieve high sensitivity, accuracy, speed, and reliability in detecting the leaks in the water systems. A user-friendly water system monitoring application seamlessly integrates hydraulic simulation, pressure monitoring, and AI-empowered leak detection.
Our application design incorporates generality and extensibility in mind, that allows to incorporate further development of leak detection technology and to allow it to be easily adapted to different water systems.
As an example, lack of real-world sensing data put a constraints on the AI model training, however, systems and methods described herein, which incorporate both simulated data and real-time monitoring data, are fully prepared for inclusion of more labeled leak data in the future. The robustness and accuracy of the AI-application is ready to embrace more real-world data collected by sensor deployment. This will further enhance the accuracy and efficacy of this AI-based leak detection methodology.
The strong generality of the systems and methods described herein allows them to be deployed to various water utilities. It is hoped that systems and methods described herein can equip water utilities with a state-of-the-art tool that empowers them to proactively detect and timely address leaks related asset management issues. By integrating the potential of real-time data and cutting-edge AI, the systems and methods described herein aim to enhance the resilience and efficiency of water supply networks, ensuring a sustainable and reliable water distribution system for communities.
The welcome page also includes three important functions that cater to the needs of water system management, including hydraulic simulation, pressure monitoring, and leak detection, making it a comprehensive solution for hydraulic system analysis.
The remaining documents are organized according to the major functions of the application, which include the hydraulic simulation function for WSN, the works that has been made for a real-time water pressure monitoring, details of the AI-empowered water leak detection and localization, and technomics analyses.
A hydraulic model is commonly used to compute the hydraulic parameters such as water pressure or water head and flow rate for the design of a water distribution network. The governing hydraulic equations describe the conservation of mass and conservation of energy considering the topological characteristics of a water pipe network. The hydraulic model allows to account for the water usage behaviors (described as water demand fluctuations at the service nodes) and events such as leakages on the network performance While hydraulic model is regarded as sufficiently accurate for water network planning purpose, there are uncertainties of the model prediction results due to fluctuating water demands, deteriorating pipe conditions, etc. A calibrated hydraulic model serves as the basis for model-based leak detection. Given it is sufficiently reliable, hydraulic model can be utilized to generate holistic artificial datasets for the development and validation of ML-based leak detection algorithms As a general note, using holistic artificially generated data is a common strategy in the development of ML technologies when data is not available due to practice constraint. The key equations used for the hydraulic computations are introduced in following.
Equation (1) of the hydraulic model describes the conversation of mass at a pipe node, which prescribes that under no leak condition the inflow of water to a pipe node equals to the outflow of water. The outflow of the water including the demand or use of water at that node as well as water flowing from this node to other nodes.
where Pn is the set of pipes connected to the node n, qp,n is the flow rate of water into node n from pipe p (m3/s), Dnact is the actual water demand at node n (m3/s), and N is the set of all nodes in the pipe network. qp,n is positive when water is flowing into node n from pipe p, otherwise, it is negative.
Equation (2) of the hydraulic model describes the conversation of energy. For water pipe network, the total energy is typically referred as the total water head, which includes components describing the kinetic energy (kinetic water head), hydraulic potential energy (pressure head), and gravitational potential energy (elevation head), i.e.,
where h is the total water head , u is the water velocity at each node, and z is the altitude of each node. HL is the energy loss between node A and node B. There are two major mechanisms for the energy lose in a pipe flow, i.e., the distributed energy loss and localized energy loss. The distributed energy loss along the pipe due to hydraulic resistance is mainly determined by the velocity of the flow V, the internal diameter of the pipe d, the length of the pipe L, and the roughness of the pipe wall, which is described by the Hazen-Williams formula [33], i.e., Equation (3).
where C is the roughness coefficient of pipe wall.
The localized energy loss is due to turbulence associated with change of flow conditions (such as flow speed, direction, or flow area etc.), which is determined by the topology of water distribution network connections.
An important phenomenon in a water supply network is the water usage or demand. Two major types of models are generally used for water demand at pipe nodes, i.e., demand-driven model and pressure-driven model. A comparison of both models is described in [34]. A pressure-driven water demand model is used in this study to consider the effects of losing pressure due to change of water demand or leaks.
where D is the demand at a particular node, Df is the desired demand(m3/s), p is the water pressure, Pf is the pressure above which the desired demand Df should be met, P0 is the water pressure below which no water will be supplied at the node. The leaking is modeled as a special type of water demand in this study. The demand due to a leaking scenario is related to the size of the leak and is described in Equation (5) [35].
where dleak is the equivalent water demand due to leak (m3/s), Cd is the discharge coefficient, with a default value 0.75, A is the area of leak, p is the internal water pressure, the exponential ∂ is the discharge coefficient, which is 0.5 for steel pipe, and ρ is water density.
A modified k-means clustering algorithm has been configured for WDN partition. Compared to the standard k-means WDN clustering which only considers the leakage characteristics of junctions, the new algorithm also considers the topology of the WSN (via the shortest physical path distance between junctions over the WDN). The pseudocode of the new k-means clustering algorithm is shown in the following Table 7.
It is noted that in Step 3.1, the leakage characteristics matrix can be obtained by using different definitions, i.e., conventional leakage characteristic matrix based on pressure change or feature extraction with ML algorithms Although PCA and AE models are used in the clustering algorithm, other ML models can also be integrated into this framework, such as the Mahalanobis classification system (MCS). In Step 3.2, the physical distance between pairs of junctions is obtained by using Dijkstra's shortest pathfinding algorithm. Other shortest path algorithms could also be considered when dealing with different types of graphs, such as Floyd-Warshall algorithm. This step guarantees the clustered junctions are concentrated based on their network path distance. Both of the pair distance matrices are normalized by dividing their largest value. Therefore, the range of these distances is from 0 to 1. In Step 5, the represent distance between junctions is defined as the unweighted average of physical distance and leakage characteristics distance. The algorithm allows to assign different weights of the leakage characteristics distance and the physical distance. In Step 6, the process of centroid redistribution of each cluster requires re-acquiring the leakage characteristics matrix with the new set of centroids. Also, in Step 6, unlike the standard k-means which used the mean value of each cluster as its centroids, the optimal junctions (that minimize distance within the cluster) is set as the new centroids so that centroids remain on the junctions in the WDN.
With clusters of nodes determined, the recommended sensor placements are at the centroid of each cluster to maximize the value of sensor data acquisition. The clustering of nodes based on their hydraulic similarities and network distance also facilitates leak localization.
AI applications requires sufficient amount of labeled data. This presents a major barrier for leak detection due to relative rare amount of labeled data corresponding to leak conditions. To overcome this technical barriers, the systems and methods are configured to implement an unsupervised ML model with Autoencoder (AE) neural network. The AE neural network model is based on a special type of neural network that is trained to reconstruct its input, so the output (y1, y2, y3, . . . , yn) would contain the same information as its input (x1, x2, x3, . . . , xn). To reduce the reconstruction error, the network is required to learn the hidden patterns between the input data.
An innovative strategy is proposed in this study to detect the leaking situation by autoencoder neural network based on its reconstruction error. The reconstruction error is characterized by the mean square error:
where MSE is the mean square error or reconstruction error, n is the dimension of the input vector x, xi is the sample data and yi is the predicted sample data.
For a model trained by dataset using normal (non-leak) condition, a large reconstruction error occurs it is inputted with data under leaking condition because the relationship described by the trained AE neural network is not valid under such condition. By setting a threshold in the construction error, the AE model can classify if a set of data corresponds to a leaking situation or a non-leaking situation.
The leakage localization can be defined as a classification problem, i.e., the leakage conditions are classified into different WDN partition zones (or clusters). There are various types of ML models for classification problems, such as the Artificial Neural Network, Support Vector Machine, Decision Tree, Random Forest (RF), etc. The systems and methods herein can employ Random Forest (RF) model because 1) RF is an efficient classification algorithm, and 2) it only needs a very few hyperparameters to be tuned. These help with the efficiency and consistency during the evaluation process.
As with ML models, the more data used for model training, the better the ML performance. For the case of leak detection, the higher the percentage of nodes with historical failures, the more accurate the detection results.
Sensitivity study is conducted to evaluate the effects of leak size on the performance of the AI-based leak detection algorithm. The leaking size is an important factor that influences the detection system performance. Conceptually, detection of small leak is much difficult than large leak, since smaller leak has less influence on the status of WSN and can be inundated with noises such as the water demand fluctuations. For the sensitivity study, the leaking size is varied from 0.01 m to 0.12 m.
Small leaks tend to be classified under normal non-leaking situations (i.e., 0% correct detection). While normal non-leaking cases are all classified correctly (i.e., 100% correct detection). This gives an accuracy of around 50% for a balanced dataset with equal number of data under both leaking and non-leaking conditions. With increasing leaking sizes, the AE model achieved higher leak detection accuracy. This is reasonable since the larger the leak size, the more disturbance it will have on the pressure distribution in the WSN to allow its detection.
Additionally, the influence the compression ratio of the AE algorithm can be examined. The compression ratio is the number of uncompressed data divided by compressed data when constructing the AE neural network. It is an important hyperparameter of the AE neural network. A large compression ratio can not only save the physical data storage space but also force the AE model to learn the internal pattern of input data. However, too much compression may lead to excess information loss and decrease the detection accuracy.
The compression ratio is found to have a negative influence on the overall detection accuracy. As shown in
Water leaks have led to significant direct and indirect costs. According to the Cleveland Water Department, the average cost for treated water is $487.53 per million gallons. In the year 2021 alone, the total annual water production reached 73,559 million gallons. Considering the aging water system have an average leaking rate around 30%, the direct cost attributed to water leaks amounted to a staggering $10,758,665 in 2021. Additionally, the entire United States experiences an average daily water leak of 6 billion gallons. As a result, the total direct cost is approximately $2,925,180 each day.
In order to estimate the economic savings of the proposed algorithm, it is assumed the leak size distribution follows a Weibull distribution as shown in Eq. 7, with the shape parameter (a) set to 1.5 and the scale parameter (b) to 20.
where a is the shape parameter, and b is the scale parameter.
Upon applying the accuracy of developed leak detection algorithm to different sized leaks (
Besides the financial benefits to recoup the lost revenue, prevention of water leaks also help mitigate the health risks to the public and improve public perception. Contamination from external sources can enter the water system in the event of leaks and pipe failures, particularly when the pipe's internal pressure is lower than the pressure by the groundwater. Efficient leak detection can significantly reduce the chance of backflow, thereby minimizing the risks to public health.
The systems and methods described herein provide a solution to the intelligent water supply challenge by developing an innovative application framework based on AI-empowered leak detection and localization. The framework starts with a new clustering algorithm that guide the optimal sensor placement. This allows to use a small number of sensors to completely cover the conditions of the whole WSN, therefore, maximizing the values of the sensor data.
The use of the AI algorithm for accurate leak detection provides an improvement over existing approaches. The novel ML model allows leaks to be detected without a vast amount of labeled dataset under leak scenarios that is difficult to obtain. The unsupervised AE algorithm, as described herein, learns the patterns of non-leaking scenarios, making it a highly efficient and reliable approach adapted to the unique demand in leak detection. Besides, the ML leak detection method relies on changes in the water pressure patterns among multiple sensors (rather than pressure differences from a single sensor as with many existing ML model). Therefore, it achieves more robust and reliable results.
Besides, the novel system partition algorithm facilitates a semi-supervised ML approach for leak localization. The significant of this approach is that it only requires data at a portion of nodes with leak history to achieve accurate localization to cover the leaks across the whole WSN. Moreover, obtaining this historical data can be achieved by creating man-made leak scenarios such as through controlled hydrant openings, making it an effective solution under practical constraints with WSN operation.
The systems and methods described herein can implement user-friendly interface, excellent extensibility and generality that allows it to be integrated with the data from the existing water SCADA system, hydraulic simulation and data query, as well as the key component of AI-based real-time leak detection and localization. These present a cutting-edge solution to support intelligent water system monitoring and asset management.
The implications of the AI-empowered water leak detection system are vast, as it will empower public utilities to promptly identify leaks or other factors leading to pipe failures in the real-time. By supporting effective and timely maintenance measures, this system will recoup the economic values associated with the vast amount of non-revenue water and significantly reduces the risk of leak-related health issues.
In view of the foregoing structural and functional description, those skilled in the art will appreciate that portions of the invention may be embodied as a method, data processing system, or computer program product. Accordingly, these portions of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware. Furthermore, portions of the invention may be a computer program product on a computer-usable storage medium having computer readable program code on the medium. Any suitable computer-readable medium may be utilized including, but not limited to, static and dynamic storage devices, hard disks, optical storage devices, and magnetic storage devices.
Certain embodiments of the invention have also been described herein with reference to block illustrations of methods, systems, and computer program products. It will be understood that blocks of the illustrations, and combinations of blocks in the illustrations, can be implemented by computer-executable instructions. These computer-executable instructions may be provided to one or more processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus (or a combination of devices and circuits) to produce a machine, such that the instructions, which execute via the processor, implement the functions specified in the block or blocks.
These computer-executable instructions may also be stored in computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture including instructions which implement the function specified in the flowchart block or blocks. The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks.
It should further be understood that various aspects disclosed herein may be combined in different combinations than the combinations specifically presented in the description and accompanying drawings. It should also be understood that, depending on the example, certain acts or events of any of the processes or methods described herein may be performed in a different sequence, may be added, merged, or left out altogether (e.g., all described acts or events may not be necessary to carry out the techniques). In addition, while certain aspects of this disclosure are described as being performed by a single module or application for purposes of clarity, it should be understood that the techniques of this disclosure may be performed by a combination of units or modules associated with, for example, a local or distributed system.
What have been described above are examples. It is, of course, not possible to describe every conceivable combination of components or methods, but one of ordinary skill in the art will recognize that many further combinations and permutations are possible. Accordingly, the invention is intended to embrace all such alterations, modifications, and variations that fall within the scope of this application, including the appended claims. Where the disclosure or claims recite “a,” “an,” “a first,” or “another” element, or the equivalent thereof, it should be interpreted to include one or more than one such element, neither requiring nor excluding two or more such elements. As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.”
All references, publications, and patents cited in the present application are herein incorporated by reference in their entirety.
This application claims priority from U.S. Provisional Application No. 63/401,643, filed Aug. 27, 2022, the subject matter of which is incorporated herein by reference in its entirety.
This invention was made with government support under 1638320 by the National Science Foundation. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
63401643 | Aug 2022 | US |