The invention relates to the technical field of pipeline detection, and particularly relates to an intelligent analysis system for inner detecting magnetic flux leakage (MFL) data in pipelines.
Pipeline transportation is widely applied as a continuous, economical, efficient and green transportation means. The design life of pipelines specified in the national standard is 20 years. As increment of operating time, the pipeline condition deteriorates year by year and potential dangers can be increased violently due to pipeline material problems, construction, corrosion and damages caused by external force. Once leakage occurs, not only can atmospheric pollution be caused, but also violent explosion can be caused easily. Therefore, safety inspection and maintenance need to be performed regularly on pipelines so as to ensure the safety of energy transportation and ecological environment.
Non-destructive testing (NDT) is widely applied as an important means for pipeline safety maintenance. At present, main methods for pipeline detection comprise MFL detection, eddy current detection and ultrasonic detection. Among them, the MFL detection is widely applied in nearly 90% of in-service pipelines, which is a defect detection technology for ferromagnetic materials with a relatively-mature technology and the most extensive application in foreign developed countries. Currently, many analytical researches exist on MFL data, including data preprocessing, detection, size inversion, data presentation and the like. However, existing researches on analysis of MFL data pay more attention to development of local points and lack of systemic view for data analysis, most of theoretical methods and application technologies lack generality and transplantability and fail to effectively combine an intelligent technology with analysis of MFL data, and a set of practical and feasible data analysis system which can be widely transplanted is difficult to form.
The invention invents an analysis software system for inner detecting MFL data in pipelines from the perspectives of surfaces and bodies, and invents a data analysis method from the perspective of artificial intelligence, a data preprocessing method based on time-domain-like sparse sampling and KNN-softmax, a pipeline connecting component based on a combination of a selective search and a convolutional neural network (CNN), an abnormal candidate region search and identification method based on a Lagrange multiplication framework and multi-source MFL data fusion, a defect inversion method based on a random forest, and a pipeline defect evaluation method based on improved standard ASME B31G.
Based on the above technical problems, the invention provides an intelligent analysis system for inner detecting MFL data in pipelines, wherein the intelligent analysis system for inner detecting MFL data in pipelines comprises a complete data set building module, a discovery module, a quantization module and a solution module.
Originally-sampled MFL data is connected with the complete data set building module, the complete data set building module is connected with the discovery module through a complete MFL data set, the discovery module is connected with the quantization module, and the quantization module is connected with the solution module.
The complete data set building module is used for data missing reconstruction and noise reduction operation on original MFL data for inner detection, and the complete data set building method based on time-domain-like sparse sampling and KNN-softmax is adopted to build the complete MFL data set.
In the complete data set building module, the originally-sampled MFL data is used as multi-source data information, specifically comprising: axial data, radial data, circumferential data and α-direction data. 109l The discovery module is used for defect detection and comprises component detection and anomaly detection, wherein the component detection completes detection of welds and flanges of pipeline connecting components; for the discovery module, a pipeline connecting component discovery method based on a combination of a selective search and a convolutional neural network (CNN) is adopted to obtain the precise position of a weld; and the whole magnetic flux leakage signals are divided into u+1 patches according to the precise position of the weld, and one patch of MFL signals is taken to find out MFL signals with defects by the abnormal candidate region search and identification method based on a Lagrange multiplication framework and multi-source MFL data fusion.
The anomaly detection comprises: detection of defects, valves, meters and metal increment, and finally obtaining defect signals.
The quantization module completes mapping from the defect signals to physical characteristics, and finally gives the defect size, namely length, width and depth, by the defect quantization method based on a random forest.
The solution module extracts all defect length columns, depth columns and pipeline property parameters in defect information from the complete MFL data set, and finally gives the evaluation results including maintenance indexes and recommendations for a single defect position, by using a pipeline solution improved based on the standard ASME B31G through a maintenance decision model, wherein the pipeline property parameters comprise minimum yield strength SMYS, minimum tensile strength SMTS, nominal outside diameter Dd, wall thickness t, and maximum allowable operating pressure MAOP; and a complete data set building method based on time-domain-like sparse sampling and KNN-softmax is adopted in the complete data set building module to obtain the complete MFL data set, and specifically comprises the following steps of:
Step 1.1: collecting the original MFL detection data directly from a MFL detection tool of submarine pipelines, and performing secondary baseline correction on data, wherein the originally-sampled MFL data is used as multi-source data information, specifically comprising: axial data, radial data, circumferential data and α-direction data.
Step 1.1.1: performing primary baseline correction on the original MFL detection data, which is expressed as:
wherein, kc is the number of mileage count points; xi,j
Step 1.1.2: removing an over-limit value ±Ta in the data, and assigning the position value of the over-limit value to the median value s of all channels, which is expressed as:
x
i
j
′
=s, if xi
Step 1.1.3: performing secondary baseline correction on data with the over-limit value removed:
wherein, kc is the number of mileage count points; xi
Step 1.2; performing time-domain-like sparse sampling anomaly detection treatment on data after secondary baseline correction.
Step 1.2.1: performing abnormal signal time-domain-like modeling on data after secondary baseline correction, namely corresponding the sampling points to time information.
Step 1.2.1.1: performing mathematical modeling on anomaly parts, wherein the modeling result is represented as:
wherein p(t)′ represents a voltage swell compensating signal of MFL detection in pipelines, f represents a signal sampling rate, t represents sampling time, t1,t2 represents sampling intervals, a represents power pipelines, n is a system fluctuation amplitude coefficient, and f(t)′ is voltage waveform change frequency.
Step 1.2.1.2: setting the variation of abnormal data of MFL detection by using the range as a collection unit, regarding the variance of pipeline system voltage data collected in each range as the data variation by using ke collected data as a range, and judging the degree of voltage signal fluctuation of MFL data. The specific calculation method comprises the steps:
wherein fi
Step 1.2.1.3: calculating the voltage state variation Δfi
Step 1.2.2: judging abnormal signals, if Δfi
Step 1.2.3: manually extracting the training sample features T=(X1, X2, . . . , X7, Xi
Step 1.3: performing missing interpolation treatment based on KNN-logistic regression on the MFL data of submarine pipelines.
Step 1.3.1: training and testing the KNN and softmax regression models.
Step 1.3.1.1: dividing the feature sample data T into two parts, wherein one part of the feature sample data XImin is used for training the KNN model, and the other part of the feature sample data TTest is used for testing the KNN model.
Step 1.3.1.2: inputting XTrain into the KNN model, setting the value of K, and training the KNN model.
Step 1.3.1.3: inputting TTest into the trained KNN model for classification, calculating the discrimination error rate, if the error rate is less than a threshold, changing the training and testing samples by a V-fold cross-validation method, and continuing performing training; else, making K=K+1, continuing training the model, and stopping training when K is greater than the threshold M.
Step 1.3.1.4: (for the feature sample data Ti
Step 1.3.1.5: adding a softmax regression model at a node of each class, wherein a hypothesis function is expressed in the formula:
wherein, x is the sample input value, y is the sample output value, θ is the training model parameter, kf is the vector dimension, ic is category ie in the classification and p(y=ie|x) represents the estimated probability value for category ie.
Step 1.3.1.6: inputting the training sample set Di
x is the sample input value; y is the sample output value; θ is the training model parameter, kf is the vector dimension; ie is category ie in the classification; je is sample input je in the classification; md is the number of samples; 1{·} is the indicative function, and if value in braces is the true value, the expression value is 1.
Step 1.3.2: calculating the loss function of the predicted result, and setting the threshold to be p, if J(θ)>P, returning to Step 1.3.2.2, making K=K+1, and continuing training the model until J(θ)≤p, when K is greater than the threshold M, stopping training, and outputting the output value y(i)′ after interpolation.
Step 1.3.3: inputting the data features and data sets to be interpolated into the trained model to realize interpolation of missing data so as to obtain the complete MFL data set, wherein because the originally-sampled MFL data is used as the multi-source data information, a complete multi-source MFL data set is obtained.
The discovery module adopts the pipeline connecting component discovery method based on a combination of a selective search and a convolutional neural network (CNN) to obtain the precise position of a weld, specifically comprising the following steps.
Step 2.1: extracting the MFL signal data of a pipeline: from a complete MFL data set, dividing a whole MFL signal matrix D into nε patches of the pipeline MFL signal matrix D1, D2, . . . , Dn
Step 2.2: color diagram of MFL signal conversion: setting the upper limit Amp of a signal amplitude and the lower limit Afloor of the signal amplitude, and converting the pipeline MFL signal matrices D1, D2, . . . , Dn
Step 2.2.1: setting the upper limit Amp of the signal amplitude and the lower limit Afloor of the signal amplitude,
Step 2.2.2: converting the pipeline MFL signal matrices D1, D2, . . . , Dn
wherein iεMn
Step 2.2.3: converting the gray matrices Gray1, Gray2, . . . , Grayn
wherein, rij is a component element of matrix R, gij is a component element of matrix G, bij and is a component element of matrix B.
Step 2.3: selective search: for the color diagram Ck of each segment of pipeline, extracting mc candidate regions rk1, rk2, . . . rkm
Step 2.3.1: for the color diagram Ci of each segment of pipeline, using a division method to obtain a candidate region set R={rk1, rk2, . . . , rkw}.
Step 2.3.2: Initializing a similarity set Sim=ϕ.
Step 2.3.3: calculating the similarities sim{rka,rkb} of all adjacent regions rka,rkb, according to the following formula:
Step 2.3.4: repeating Step 2.3.3 until the similarities of all adjacent regions are calculated, and updating the similarity set Sim according to the following formula:
Sim=Sim∪sim(rka,rkb)
Step 2.3.5: finding the maximum similarity sim{rka,rkb}=max(Sim) from Sim, and obtaining a merged region accordingly:
r
ka
=r
kc
∪r
kd
removing sim{rkc,rkd} from Sim.
Step 2.3.6: repeating Step 2.3.5 until Sim is empty so as to obtain mc merged regions rk1, rk2, . . . rkm, wherein these regions are candidate regions.
Step 2.4: convolution neural network; judging the extracted candidate regions by the convolutional neural network (CNN), and recording the position Loc1, Loc2, . . . , Locn and the score Soc1, Soc2, . . . , Socn of the weld judged by the convolutional neural network (CNN).
Step 2.5: non-maximum suppression: obtaining the precise position L1, L2, . . . , La of the weld in Step 2.4 according to the above position Loc1, Loc2, . . . , Locw and score Soc1, Soc2, . . . , Socw of the weld through the non-maximum suppression algorithm, wherein according to the precise position of the weld, the whole MFL signals are divided into u+1 patches, one patch of MFL signals is taken, the discovery module adopts an abnormal candidate region search and identification method based on a Lagrange multiplication framework and multi-source MFL data fusion to find out MFL signals with defects, specifically comprising the following steps.
Step 3.1: establishing a data reconstruction framework based on Lagrange multiplication.
Step 3.1.1: establishing a data reconstruction model
wherein P is an observed matrix, E is an error matrix, A is a low-rank matrix after reconstruction, ∥⋅∥, represents the 1 norm of the matrix, ∥⋅∥ represents the nuclear norm of the matrix, and λ is the weight parameter.
Step 3.1.2: changing a constrained optimization model into an unconstrained optimization model,
wherein l represents the Lagrange function, <⋅> represents the inner product of the matrix, μ is a penalty factor, Y is the Lagrange multiplication matrix, and the unconstrained model minimization problem can be solved through an iterative process as follows:
Step 3.1.3: Iterative optimization, wherein the optimization model of matrix A
for the convenience of calculation, the nuclear norm minimization problem can be solved by a soft threshold operator, the calculation formula of the soft threshold is (x,r)=sgn(x)(|x|−τ)a, wherein ya=max(y,0), the operator can be used in the optimization process as follows:
USVT is singular value decomposition of the matrix Z, for ∀Z∈Rm×n, U∈Rmsr, and V∈Rrsn, r is the rank of the matrix,
therefore, the optimization problem of the matrix A is transformed into
and similarly, the optimization problem of the matrix E is transformed into
Step 3.1.4: setting an iteration cut-off condition, wherein the cut-off condition is:
S is the weight matrix, and the S weight matrix is used, so that the iteration time can be greatly shortened, and the detection speed can be increased.
Step 3.2: abnormal candidate region search in pipelines based on multi-data fusion.
Step 3.2.1: performing abnormal region research on uniaxial data respectively under the data reconstruction framework based on Lagrange multiplication to obtain triaxial abnormal regions Ox, Oy, Oz.
Step 3.2.2: establishing a triaxial fusion optimization framework:
min(OXOY∪OZ), subject to OXi∪OYj∪OZk≠Ø
Step 3.2.3: eliminating overlapping by a non-maximum suppression algorithm while considering the diversity of generation of candidate regions, merging windows which are close with each other, and using the maximum outer boundary of two windows as the outer boundary of a new form, wherein the merging criterion is that: if the transverse center distance of adjacent windows is less than the minimum transverse length of the adjacent windows.
Step 3.3: anomaly identification of MFL in pipelines based on an evolvable model.
Step 3.3.1: extracting abnormal samples from a complete MFL data set, and establishing an anomaly identification model based on the convolutional neural network (CNN).
Step 3.3.2: for incorrectly-identified samples, adding new labels as new classification, going to Step 3.3.1, re-establishing the anomaly identification model, performing reclassification, and finding out the MFL signals with defects, wherein the quantization module adopts a defect quantization method based on a random forest to obtain the defect size, specifically comprising the following steps.
Step 4.1: collecting data; detecting the defect MFL signals, and extracting features of the MFL signals to obtain the feature values of the defect MFL signals, specifically as follows: finding out the peak-valley position and peak-valley value of an MFL signal of axial maximum channel according to the minimum point on the MFL signal of axial maximum channel; after judging and determining as single-peak and double-peak defects, extracting 10 waveform-related features, namely peak value of single-peak defect. Maximum peak-valley difference of single-peak defect, valley width of double-peak defect, left peak-valley difference and right peak-valley difference of double-peak defect signals, peak-to-peak distance of double-peak defect signals, axial spacing between special points, area feature, surface energy feature, defect volume, and defect body energy.
The 10 features are specifically described as follows:
A. peak value of single-peak defect: Yc is the defect minimum valley value, and Yp-v is the maximum peak-valley difference. Since the defect MFL signals are affected by various factors such as detection environments of the inner detection tool, the baseline of data fluctuates greatly. Taking the peak-valley difference of defect data as a feature quantity can eliminate the influence of the signal baseline well and improve the reliability of quantitative analysis of defects;
B. maximum peak-valley difference of single-peak defect: expression is: Pp-v=Yp−Yv, wherein Yp is the peak value of single-peak defect, Yv is the minimum valley value of defects, and Yp-v is the maximum peak-valley difference. Since the defect MFL signals are affected by various factors such as detection environments of the inner detection tool, the baseline of data fluctuates greatly. Taking the peak-valley difference of defect data as a feature quantity can eliminate the influence of the signal baseline well and improve the reliability of quantitative analysis of defects;
C. valley width of double-peak defect: formulated as: Xv-v=Xvr−Xvl, wherein Xv-v represents the valley width of an axial signal of defects, Xvr is the right valley position of the defects, and Xvl is the left valley position of the defects. The valley width of defect signals can reflect the axial distribution of the defect signals;
D. left peak-valley difference and right peak-valley difference of double-peak defect signals: formulated as: Ylp-lv=Ylp−Ylv, Yrp-rc=Yrp−Yrv, wherein Ylv is the left valley value of MFL signals, Yrv is the right valley value of the MFL signals, Ylp is the left peak value of double-peak signals, Yrp is the right peak value of the double-peak signals, Ylp-lv is the left peak-valley difference, and Yrp-rv is the right peak-valley difference;
E. peak-to-peak distance of double-peak defect signals: formulated as: Xp-p=Xpr−Xpl, wherein Xpr is the right-peak position, Xpl, is the left-peak position, and Xp-p is the peak-to-peak distance of signals. A combination of the peak-to-peak distance and the peak-valley value of defect signals can roughly determine the shape of an abnormal data curve, which is contribute to quantitative analysis of defect length and depth;
F. axial spacing between special points: in order to obtain the key feature quantity of defect length, the extraction method of special points comprises: setting the proportion m_RateA of rectification, and calculating the threshold according to X+(Y−X)*m_RateA, wherein X is the mean value of valley values. Y is the maximum peak value, two points closest to the threshold in the MFL signal of axial maximum channel s are the special points, and the spacing between special points is the key feature quantity for obtaining the defect length;
G. area feature: A valley value with a lower value is taken as the baseline, the area covered between data curves of two valleys and the baseline is taken and formulated as:
wherein Sa represents the waveform area of defects; x(t) represents the signal data point of defects; min[x(t)] represents the minimum valley value of defects; N1 represents the left valley position of defects; N2 represents the right valley position of defects;
H. surface energy feature: the energy of a data curve between two valleys is obtained and formulated as:
wherein, Sc is the defect waveform surface energy;
I. defect volume: the defect volume is obtained by summing the defect areas within a defect channel range, and formulated as:
wherein Va represents the defect volume; n1 represents the starting channel determined by the position of a direction signal at a special point; n2 represents the termination channel determined by the position of a circumferential signal at a special point; and Sa(t) represents the single-channel axial defect area; and
J. defect body energy: the defect body energy is obtained by summing the defect surface energy within the defect range, and formulated as:
wherein, Ve represents the defect body energy; and Sc(t) represents the surface energy of single-channel axial defect signals.
Step 4.2: using the feature value of the defect MFL signal as a sample; using the manually-measured defect size as a label, wherein the defect size includes the depth, width and length of a defect; manually selecting the initial training set and the testing set.
Step 4.3: training the network; inputting the training set into an initial random forest network.
Step 4.4: adjusting the network; inspecting the results of the random forest regression network through the testing set, and obtaining a final network by adjusting parameters.
Step 4.4.1: selecting me defect samples by a Bootstraping method by random sampling with replacement from the Mh×Nh dimension of original MFL signal feature defect samples, with me≤Mh, performing samplings for Tc times in total, and generating Tc training sets.
Step 4.4.2: for the Tc training sets, training Tc regression tree models, respectively.
Step 4.4.3: for a single regression tree model, selecting ne features from a MFL defect signal feature set, wherein nc≤v; then performing division each time based on the information gain ratio
wherein HA(D) in the formula represents the entropy of feature A, and g(D,A) represents information gain; selecting the feature with the maximum information gain ratio for division; initially, setting the maximum feature number, max_features, of the parameters as None, that is, without limiting the feature number selected in the network.
Step 4.4.4: enabling every tree to keep division like this, in order to prevent overfitting in the process of division, pruning the regression tree through consideration of the complexity of the regression tree. Pruning is performed by minimizing the loss function Cα(T)=C(T)+α|T|, wherein C(T) represents the model's prediction error for the defect size, namely, the degree of fitting. |T| represents model complexity, and α is used to regulate the complexity of the regression tree. The prediction error of the loss function is taken as the value at POF 90% position by using the international POF standards for sea oil transportation.
Step 4.4.5: for model parameter tuning optimization, finding out the optimal parameters by CVGridSearch and K-fold cross-validation, wherein the optimal parameters comprise random forest framework parameter, out-of-bag sample evaluation score eonb and maximum number of iterations, as well as maximum feature number of tree model parameter, i.e. max_features, maximum depth, minimum number of samples required for inner node subdivision and minimum number of samples of leaf nodes.
Step 4.4.6: forming the random forest by a plurality of generated decision trees, for the regression problem network established from defect feature samples, the finally-predicted defect size is determined by the mean value of the predicted values of a plurality of trees.
Step 4.5: inputting the data to be tested in the testing set into the random forest network adjusted according to Step 4.4, and outputting the predicted defect size, wherein at this time, if the data to be tested is the depth in the defect size, the output size is the depth of the predicted defect size; if the data to be tested is the width in the defect size, the output size is the width of the predicted defect size; if the data to be tested is the length in the defect size, the output size is the length of the predicted defect size, wherein, predicted depth reflects the value at position 80% ranked by the absolute value of error according to the international POF standards for oil pipelines, the formula is: POF=sort(|(ye−{tilde over (y)}e)|)×80%, wherein ye and {tilde over (y)}e are design depth and predicted depth, respectively.
The solution module adopts a pipeline solution improved based on the standard ASME B31G, imports the maintenance decision model and outputs the evaluation results, specifically comprising the following steps.
Step 5.1: extracting all defect length columns, depth columns and pipeline property parameters in defect information from a complete MFL data set, wherein the pipeline property parameters comprise minimum yield strength SMYS, minimum tensile strength SMTS, nominal outside diameter Dd, wall thickness ta and maximum allowable operating pressure MAOP.
Step 5.2: calculating the value
of rheological stress, wherein SMYS is the minimum yield strength of the pipe in Mpa, and SMTS is the minimum tensile strength in Mpa.
Step 5.3: calculating the predicted failure pressure
of pipelines, when z≤20, the length expansion coefficient
when z>20, the length expansion coefficient Lθ=(ηz+λn)
the metal loss area
in a corrosion area, and the original area Aarea0=taL, wherein d is the defect depth in mm; ta is the pipeline wall thickness in mm; Dd is the nominal outside diameter in mm.
Step 5.4: calculating the maximum failure pressure
of the pipeline, reorganizing and getting:
when z≤20, θα=⅔, when z>20, θa=1.
Step 5.5: calculating the maintenance index
wherein
P is the maximum allowable design pressure; if the maintenance index ERF is less than 1, it indicates that the defect is acceptable; if ERF is greater than or equal to 1, the defect is unacceptable, and then the pipe should be maintained or replaced.
Step 5.6: importing the maintenance decision model, conducting qualitative and quantitative analysis based on expert experiences and a life prediction model, then evaluating the severity of pipeline corrosion, formulating maintenance rules, and outputing the evaluation results according to the maintenance rules, comprising: maintenance index and maintenance recommendations; wherein rule 1: the maximum depth of wall thickness loss at the defect, which is greater than or equal to 80%, is considered as major corrosion, and maintenance is recommended: the pipe needs to be maintained or replaced immediately, rule 2: the ERF at the defect is greater than or equal to 1, which is considered as severe corrosion, maintenance is recommended: the pipe needs to be maintained immediately, rule 3: the ERF at the defect is greater than or equal to 0.95 and less than 1.0, which is considered as general corrosion, maintenance is recommended: the defect can be observed for 1-3 months, rule 4: the maximum depth at the defect is greater than or equal to 20% and less than 40%, which is considered as minor corrosion, maintenance is recommended: the defect can be observed regularly without treatment.
The intelligent analysis system has the following beneficial technical effects:
1. Compared with a general baseline correction algorithm, the complete data set building module proposes a secondary baseline correction algorithm, and the method reduces the influence of abnormal data on the overall base value and improves the accuracy of baseline correction. Also, the algorithm of adding logical regression in each KNN box is adopted to realize the interpolation of missing data. The method is applicable in different types of data missing, and has a powerful anti-interference ability against the uncertainty of actual engineering data;
2. In the discovery module, a selective search algorithm is introduced to generate candidate regions, which is different from the general weld detection method, so that the speed and the accuracy of generating the candidate regions are increased; the candidate regions are classified using a convolutional neural network (CNN) algorithm, so that the robustness of the weld detection algorithm to signal noise is increased, and the classification accuracy is improved;
3. In the discovery module, multi-source MFL data is adopted for reconstruction, anomaly detection is realized by analyzing the deviation between reconstructed data and source data, and a novel weight matrix is applied in condition calculations so as to increase the algorithm speed. The experimental results show that the method is good in effects of anomaly detection;
4. In the quantization module, different from a general feature extraction method, according to the characteristic of instable sudden changes in MFL signals, the invention proposes a feature extraction method based on MFL signal waveform and statistics, so that the model identification effect is enhanced; an iterative loss function of the random forest is customized by using POF standards for offshore oil pipelines, making the algorithm highly adaptable in the field and highly accurate in defect quantization results. The method disclosed by the invention has been applied to the practical inversion of engineering pipelines, having a good effect of defect size quantification;
5. In the solution module, the invention is based on practical engineering applications. Compared with an original ASME B31G method, the method improves the calculation of theological stress, thereby increasing the failure pressure and reducing the conservatism, but ASME B31G has too high conservatism, so that the high conservatism of ASME B31G does not cause a large amount of maintenance costs due to frequent maintenance; and
6. The invention proposes an intelligent analysis system and method for detecting MFL data in pipelines. Compared with the general analysis method of MFL data, the invention proposes an intelligent analysis process for detecting MFL data in pipelines from the overall perspective. The process sequence comprises a complete data set building module, a discovery module, a quantization module and a solution module. The process realizes the preprocessing of original MFL data detected in pipelines, detection of connecting components and anomaly detection which comprises: detection of defects, valves, meters and metal increment, defect size inversion and final maintenance decision.
The invention will be further described below in combination with the drawings and embodiments.
The invention provides an intelligent analysis software system for inner detecting MFL data in pipelines, proposes an analysis system for inner detecting MFL data from the overall perspective of non-destructive testing evaluation, and invents a complete data set building method based on time-domain-like sparse sampling and KNN-softmax from the perspective of intelligence, a pipeline connecting component discovery method based on a combination of a selective search and a convolutional neural network (CNN), an abnormal candidate region search and identification method based on a Lagrange multiplication framework and multi-source MFL data fusion, a defect quantization method based on a random forest and a pipeline solution improved based on standard ASME B31G. The safe operation and maintenance of pipelines are realized.
The block diagram of the intelligent analysis software system of MFL data of the invention is as shown in
The intelligent analysis system for inner detecting MFL data in pipelines proposed by the invention, as shown in
The flow chart of data preprocessing based on time-domain-like sparse sampling and KNN-softmax is as shown in
Step 1.1: collecting the original MFL detection data directly from a MFL detection tool of submarine pipelines, and performing secondary baseline correction on data, wherein the originally-sampled MFL data is used as multi-source data information, specifically comprising: axial data, radial data, circumferential data and α-direction data.
Step 1.1.1: performing primary baseline correction on the original MFL detection data, which is expressed as:
wherein, ke is the number of mileage count points; xi
Step 1.1.2: removing an over-limit value ±Ta in the data, and assigning the position value of the over-limit value to the median value s of all channels, which is expressed as:
x
i
j
′
=s, if xi
Step 1.1.3: performing secondary baseline correction on data with the over-limit value removed:
wherein, ke is the number of mileage count points; xi
Step 1.2: performing time-domain-like sparse sampling anomaly detection treatment on data after secondary baseline correction.
Step 1.2.1: performing abnormal signal time-domain-like modeling on data after secondary baseline correction, namely corresponding the sampling points to time information.
Step 1.2.1.1: performing mathematical modeling on anomaly parts, wherein the modeling result is represented as:
f(t)=p(t)*sin(2πnft)
wherein
wherein p(t)′ represents a voltage swell compensating signal of MFL detection in pipelines, f represents a signal sampling rate, t represents sampling time, t1,t2 represents sampling intervals, a represents power pipelines, n is a system fluctuation amplitude coefficient, and f(t) is voltage waveform change frequency.
Step 1.2.1.2; setting the variation of abnormal data of MFL detection by using the range as a collection unit, regarding the variance of pipeline system voltage data collected in each range as the data variation by using ke=100 collected data as a range, and judging the degree of voltage signal fluctuation of MFL data. The specific calculation method comprises the steps:
wherein fi
Step 1.2.1.3: calculating the voltage state variation Δfi
Step 1.2.2: judging abnormal signals, if Δfi
Step 1.2.3: manually extracting the training sample features T=(X1, X2, . . . , X7, X8), wherein a total of 8 features are extracted, which are left valley value, right valley value, valley width value, peak value, left peak-valley difference, right peak-valley difference, differential left peak value and differential right peak value.
Manually extracting the testing sample features T=(X1′, X2′, . . . , X7′, X8′) wherein a total of 8 features are extracted, which are left valley value, right valley value, valley width, peak value, left peak-valley difference, right peak-valley difference, differential left peak value and differential right peak value.
Manually extracting the features T″=(X1″, X2″ . . . , X7″, X8″) of data to be interpolated, wherein a total of 8 features are extracted, which are left valley value, right valley value, valley width, peak value, left peak-valley difference, right peak-valley difference, differential left peak value and differential right peak value.
Step 1.3: performing missing interpolation treatment based on KNN-logistic regression on the MFL data of submarine pipelines.
Step 1.3.1: training and testing the KNN and softmax regression models.
Step 1.3.1.1: dividing the feature sample data T into two parts, wherein one part of the feature sample data XTrain is used for training the KNN model, and the other part of the feature sample data TTest is used for testing the KNN model.
Step 1.3.1.2: inputting XTrain into the KNN model, setting the initial value of K to 5, and training the KNN model.
Step 1.3.1.3: inputting TTest into the trained KNN model for classification, calculating the discrimination error rate, if the error rate is less than a threshold, changing the training and testing samples by a V-fold cross-validation method, and continuing performing training; else, making K=K+1, continuing training the model, and stopping training when K is greater than the threshold M.
Step 1.3.1.4: (for the feature sample data Ti
Step 1.3.1.5: adding a softmax regression model at a node of each class, wherein a hypothesis function is expressed in the formula:
wherein, x is the sample input value, y is the sample output value, θ is the training model parameter, kf is the vector dimension, ie is category ie in the classification and p(y=ie|x) represents the estimated probability value for category ie.
Step 1.3.1.6: inputting the training sample set Di
x is the sample input value; y is the sample output value; θ is the training model parameter; kj is the vector dimension: ie is category ie in the classification; je is sample input je in the classification; md is the number of samples; 1{⋅} is the indicative function, and if value in braces is the true value, the expression value is 1.
Step 1.3.2: calculating the loss function of the predicted result, and setting the threshold P to be 0.5, if J(θ)>P, returning to Step 1.3.2.2, making K=K+1, and continuing training the model until J(θ)≤P, when K is greater than the threshold M, stopping training, and outputting the output value y(i)′ after interpolation.
Step 1.3.3: inputting the data features and data sets to be interpolated into the trained model to realize interpolation of missing data so as to obtain the complete MFL data set, wherein because the originally-sampled MFL data is used as the multi-source data information, a complete multi-source MFL data set is obtained.
Simulation results of Step 1:
The discovery module adopts the pipeline connecting component discovery method based on a combination of a selective search and a convolutional neural network (CNN) to obtain the precise position of a weld, specifically comprising the following steps that a detection flow of pipeline connecting components based on the combination of the selective search and the convolutional neural network (CNN) of the invention is as shown in
Step 2.1: extracting the MFL signal data of a pipeline: from a complete MFL data set, dividing a whole MFL signal matrix D into ne patches of the pipeline MFL signal matrix D1, D2, . . . , Dn
Step 2.2: color diagram of MFL signal conversion: setting the upper limit Atop of a signal amplitude and the lower limit Afloor of the signal amplitude, and converting the pipeline MFL signal matrices D1, D2, . . . , Dn
Step 2.2.1: setting the upper limit Atop of the signal amplitude and the lower limit Afloor of the signal amplitude.
Step 2.2.2: converting the pipeline MFL signal matrices D1, D2, . . . , Dn
wherein i∈Mn
Step 2.2.3: converting the gray matrices Gray1, Gray2, . . . , Grayn
wherein c=255, rij is a component element of matrix R;gij is a component element of matrix G; bij is a component element of matrix B.
Step 2.3: selective search: for the color diagram Ck of each segment of pipeline, extracting me candidate regions rk1, rk2, . . . rkw, by selective search.
Step 2.3.1: for the color diagram Ck of each segment of pipeline, using a division method to obtain a candidate region set Rk=rk1, rk2, . . . , rkw).
Step 2.3.2: initializing a similarity set Sim=ϕ.
Step 2.3.3: calculating the similarities sim{rka,rkb} of all adjacent regions rka,rkb according to the following formula.
Step 2.3.4: repeating Step 2.3.3 until the similarities of all adjacent regions are calculated, and updating the similarity set Sim according to the following formula:
Sim=Sim∪sim(rka,rkb)
Step 2.3.5: finding the maximum similarity sim{rka,rkb}=max(Sim) from Sim, and obtaining a merged region rkb=rkc∪rkd, accordingly; removing sim{rkc,rkd} from Sim.
Step 2.3.6: repeating Step 2.3.5 until Sim is empty so as to obtain mc merged regions rk1, rk2, . . . rkm
Step 2.4: convolution neural network: candidate region identification.
Step 2.4.1: building a convolutional neural network (CNN) with input of 72×72, and an intermediate layer of the convolutional neural network (CNN) comprises 4 convolutional layers, 4 down-sampling layers and 1 fully connected layer, wherein each convolutional layer is followed by a down-sampling layer used to evaluate local weighted mean as secondary feature extraction.
Step 2.4.2: extracting weld color diagrams of P N1×N1 from historical data as samples of the convolutional neural network (CNN), wherein 80% of random samples are used as training samples, and the remaining 20% are used as testing samples.
Step 2.4.3: repeatedly training the network for 500 times, wherein the one with the highest success rate of testing is used as the final network Net.
Step 2.4.4: inputting the candidate regions rk1,rk2, . . . , rkw, into the trained convolutional neural network (CNN) respectively for discrimination, for the region which is judged to be the weld, recording the position Loc and the network score Soc of the region, and finally, obtaining w positions Loc1, Loc2, . . . , Locw and scores Soc1, Soc2, . . . , Socw.
Step 2.5: Non-maximum suppression: obtaining the precise position L1, L2, . . . , Ln of the weld according to the position Loc1, Loc2, . . . , Locw and the score Soc1, Soc2, . . . , Socw of the weld seam based on the non-maximum suppression algorithm, wherein simulation results of Step 2 are as shown in
According to the precise position of the weld, the whole MFL signals are divided into u+1 patches, one patch of MFL signals is taken, the discovery module adopts an abnormal candidate region search and identification method based on a Lagrange multiplication framework and multi-source MFL data fusion to find out MFL signals with defects, as shown in
Step 3.1: establishing a data reconstruction framework based on Lagrange multiplication, wherein the search flow of abnormal regions based on Lagrange multiplication of the invention is as shown in
Step 3.1.1: establishing a data reconstruction model:
Step 3.1.2: changing a constrained optimization model into an unconstrained optimization model,
wherein the unconstrained model minimization problem can be solved through an iterative process as follows:
Step 3.1.3: iterative optimization, wherein the optimization model of matrix A is:
for the convenience of calculation, the nuclear norm minimization problem can be solved by a soft threshold operator, the calculation formula of the soft threshold is (x,r)=sgn(x(|x|−τ)+, wherein yo=max(y,0), the operator can be used in the optimization process as follows:
therefore, the optimization problem of the matrix A is transformed into
and similarly, the optimization problem of the matrix E is transformed into
Step 3.1.4; setting an iteration cut-off condition, wherein the cut-off condition is:
wherein S is the weight matrix, and the application of the S weight matrix can greatly reduce the iteration time, so that the detection speed can be increased; the matrix s of the invention is set as follows:
Step 3.2: abnormal candidate region search in pipelines based on multi-data fusion, wherein the recommendation and identification framework for abnormal candidate regions based on multi-source MFL data fusion is as shown in
Step 3.2.1: performing abnormal region research on uniaxial data respectively under the data reconstruction framework based on Lagrange multiplication so as to obtain triaxial abnormal regions, which are respectively OX, OY, OZ.
Step 3.2.2: establishing a triaxial fusion optimization framework:
min(OX∪OY∪OZ),subject to OXi∪OYj∪OZk=Ø
Step 3.2.3: eliminating overlapping by a non-maximum suppression algorithm while considering the diversity of generation of candidate regions, merging windows which are close with each other, and using the maximum outer boundary of two windows as the outer boundary of a new form, wherein the merging criterion is that: if the transverse center distance of adjacent windows is less than the minimum transverse length of the adjacent windows.
Step 3.3: anomaly identification of MFL in pipelines based on an evolvable model.
Step 3.3.1: extracting abnormal samples from a complete MFL data set, and establishing an anomaly identification model based on the convolutional neural network (CNN).
Step 3.3.2: For those incorrectly-identified samples, adding new labels, and reinputting the new labels into the model for training, wherein along with the increase of transition data, the identification model is evolving gradually, the simulation results in Step 3, as shown in
The quantization module adopts a defect quantization method based on a random forest to obtain the defect size, specifically comprising the following steps.
Step 4.1: collecting data; detecting the defect MFL signals, and extracting features of the MFL signals to obtain the feature values of the defect MFL signals, specifically as follows.
Finding out the peak-valley position and peak-valley value of an MFL signal of axial maximum channel according to the minimum point on the MFL signal of axial maximum channel; after judging and determining as single-peak and double-peak defects, extracting 10 waveform-related features, namely peak value of single-peak defect, Maximum peak-valley difference of single-peak defect, valley width of double-peak defect, left peak-valley difference and right peak-valley difference of double-peak defect signals, peak-to-peak distance of double-peak defect signals, axial spacing between special points, area feature, surface energy feature, defect volume, and defect body energy.
The 10 features are specifically described as follows.
A. peak value of single-peak defect: YY is the defect minimum valley value, and Yp-v is the maximum peak-valley difference. Since the defect MFL signals are affected by various factors such as detection environments of the inner detection tool the baseline of data fluctuates greatly. Taking the peak-valley difference of defect data as a feature quantity can eliminate the influence of the signal baseline well and improve the reliability of quantitative analysis of defects.
B. maximum peak-valley difference of single-peak defect: expression is: Yp-v=Yp−Yv, wherein Yp is the peak value of single-peak defect, Yv is the minimum valley value of defects, and Yp-v is the maximum peak-valley difference. Since the defect MFL signals are affected by various factors such as detection environments of the inner detection tool, the baseline of data fluctuates greatly. Taking the peak-valley difference of defect data as a feature quantity can eliminate the influence of the signal baseline well and improve the reliability of quantitative analysis of defects.
C. valley width of double-peak defect: formulated as: Xv-v=Xer−Xvl, wherein Xv-v represents the valley width of an axial signal of defects, Xvr is the right valley position of the defects, and Xvl is the left valley position of the defects. The valley width of defect signals can reflect the axial distribution of the defect signals.
D. left peak-valley difference and right peak-valley difference of double-peak defect signals: formulated as: Ylp-lv=Ylp−Ylv, Yrp-rc=Yrp−Yrv, wherein Ylv is the left valley value of MFL signals, Yrv is the right valley value of the MFL signals, Ylp is the left peak value of double-peak signals, Yrp is the right peak value of the double-peak signals, Ylp-lv is the left peak-valley difference, and Yrp-rv is the right peak-valley difference.
E. peak-to-peak distance of double-peak defect signals: formulated as: Xp-p=Xpr−Xpl, wherein Xpr is the right-peak position, Xpl is the left-peak position, and Xp-p is the peak-to-peak distance of signals. A combination of the peak-to-peak distance and the peak-valley value of defect signals can roughly determine the shape of an abnormal data curve, which is contribute to quantitative analysis of defect length and depth.
F. axial spacing between special points: in order to obtain the key feature quantity of defect length, the extraction method of special points comprises: setting the proportion m_RateA of rectification, and calculating the threshold according to X+(Y−X)*m_RateA, wherein X is the mean value of valley values, Y is the maximum peak value, two points closest to the threshold in the MFL signal of axial maximum channel s are the special points, and the spacing between special points is the key feature quantity for obtaining the defect length.
G. area feature: A valley value with a lower value is taken as the baseline, the area covered between data curves of two valleys and the baseline is taken and formulated as:
wherein Sa represents the waveform area of defects; x(t) represents the signal data point of defects; min[x(t)] represents the minimum valley value of defects; N1 represents the left valley position of defects; N2 represents the right valley position of defects.
H. surface energy feature: the energy of a data curve between two valleys is obtained and formulated as:
wherein, Sa is the defect waveform surface energy.
I. defect volume: The defect volume is obtained by summing the defect areas within a defect channel range, and formulated as:
wherein Va represents the defect volume; n1 represents the starting channel determined by the position of a direction signal at a special point; n2 represents the termination channel determined by the position of a circumferential signal at a special point; and Sa(i) represents the single-channel axial defect area.
J. defect body energy: The defect body energy is obtained by summing the defect surface energy within the defect range, and formulated as:
wherein, Va represents the defect body energy: and Sa(t) represents the surface energy of single-channel axial defect signals.
Step 4.2: using the feature value of the defect MFL signal as a sample; using the manually-measured defect size as a label, wherein the defect size includes the depth, width and length of a defect: manually selecting the initial training set and the testing set.
Step 4.3: training the network; inputting the training set into an initial random forest network.
Step 4.4: adjusting the network; inspecting the results of the random forest regression network through the testing set, and obtaining a final network by adjusting parameters, wherein the specific practice is: inputting Mh=666, Nh=6, setting the parameters m1=sqrt( ), Tf=56, specifically, initially setting the parameter to nf=nf/3, and setting the maximum feature number, max_features, to be None.
Step 4.4.1: selecting m, defect samples by a Bootstraping method by random sampling with replacement from the Mh×Nh dimension of original MFL signal feature defect samples, with me≤Mh, performing samplings for Tc times in total, and generating Tc training sets.
Step 4.4.2: for the Tc training sets, training Tc regression tree models, respectively.
Step 4.4.3: for a single regression tree model, selecting ne features from a MFL defect signal feature set, wherein ne≤N; then performing division each time based on the information gain ratio
wherein HA(D) in the formula represents the entropy of feature A, and g(D,A) represents information gain; selecting the feature with the maximum information gain ratio for division; initially, setting the maximum feature number, max_features, of the parameters as None, that is, without limiting the feature number selected in the network.
Step 4.4.4: every tree keeps division like this, in order to prevent overfitting in the process of division, pruning the regression tree through consideration of the complexity of the regression tree. Pruning is performed by minimizing the loss function Ca(T)=C(T)+α|T|, wherein C(T) represents the model's prediction error for the defect size, namely, the degree of fitting, |T| represents model complexity, and α is used to regulate the complexity of the regression tree. The prediction error of the loss function is taken as the value at POF 90% position by using the international POF standards for sea oil transportation. Initially, setting the maximum tree depth, max_depth to be 5.
Step 4.4.5: for model parameter tuning optimization, finding out the optimal parameters by CVGridScarch and K-fold cross-validation, wherein the optimal parameters comprise random forest framework parameter, out-of-bag sample evaluation score eonb and maximum number of iterations, as well as maximum feature number of tree model parameter, i.e. max_features, maximum depth, minimum number of samples required for inner node subdivision and minimum number of samples of leaf nodes.
Step 4.4.6: forming the random forest by a plurality of generated decision trees, for the regression problem network established from defect feature samples, the finally-predicted defect size is determined by the mean value of the predicted values of a plurality of trees.
Step 4.5: inputting the data to be tested into the random forest network adjusted according to Step 4.4, and outputting the predicted defect size, wherein at this time, if the data to be tested is the depth in the defect size, the output size is the depth of the predicted defect size; if the data to be tested is the width in the defect size, the output size is the width of the predicted defect size; if the data to be tested is the length in the defect size, the output size is the length of the predicted defect size, wherein, predicted depth reflects the value at position 80% ranked by the absolute value of error according to the international POF standards for oil pipelines, the formula is: POFS0=sort(|(yc−{tilde over (y)}c))×80%, wherein yc and {tilde over (y)}c are design depth and predicted depth, respectively. The condition that an intergenerational loss function in iteration np is no longer reduced is used as the termination condition of seeking optimum parameters, and the maximum number of iterations n_estimators in the final output of the network is 172.
The simulation results of Step 4 and a performance comparison of the invention with the traditional defect inversion algorithm are as shown in Table 1:
Table 1 and
It can be seen from
Step 5.1: extracting all defect length columns, depth columns and pipeline property parameters in defect information from a complete MFL data set, wherein the pipeline property parameters comprise minimum yield strength SMYS, minimum tensile strength SMTS, nominal outside diameter Dd, wall thickness ta and maximum allowable operating pressure MAOP.
Step 5.2: calculating the value
of rheological stress, wherein SMYS is the minimum yield strength of the pipe in Mpa, and SMTS is the minimum tensile strength in Mpa.
Step 5.3: calculating the predicted failure pressure
of pipelines, when z≤20, the length expansion coefficient.
when z>20, the length expansion coefficient L0=(ηz+λa),
the metal loss area
in a corrosion area, and the original area Aareab=taL, wherein d is the defect depth in mm; ta is the pipeline wall thickness in mm; Dd is the nominal outside diameter in mm.
Step 5.4: calculating the maximum failure pressure
of the pipeline, reorganizing and getting:
when z≤20, θa=⅔ when z>20, θa=1.
Step 5.5: calculating the maintenance index
wherein
P is the maximum allowable design pressure; if the maintenance index ERF is less than 1, it indicates that the defect is acceptable; if ERF is greater than or equal to 1, the defect is unacceptable, and then the pipe should be maintained or replaced.
Step 5.6: importing the maintenance decision model, conducting qualitative and quantitative analysis based on expert experiences and a life prediction model, then evaluating the severity of pipeline corrosion, formulating maintenance rules, and outputing the evaluation results according to the maintenance rules, comprising: maintenance index and maintenance recommendations; wherein rule 1: the maximum depth of wall thickness loss at the defect, which is greater than or equal to 80%, is considered as major corrosion, and maintenance is recommended: the pipe needs to be maintained or replaced immediately, rule 2: the ERF at the defect is greater than or equal to 1, which is considered as severe corrosion, maintenance is recommended: the pipe needs to be maintained immediately, rule 3: the ERF at the defect is greater than or equal to 0.95 and less than 1.0, which is considered as general corrosion, maintenance is recommended: the defect can be observed for 1-3 months, rule 4: the maximum depth at the defect is greater than or equal to 20% and less than 40%, which is considered as minor corrosion, maintenance is recommended: the defect can be observed regularly without treatment.
For simulation results of Step 5, the curve as shown in
Number | Date | Country | Kind |
---|---|---|---|
201811633698.5 | Dec 2018 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2019/074907 | 2/13/2019 | WO | 00 |