The present disclosure relates to a data analysis apparatus, data analysis program, and data analysis method.
To maintain and manage enormous numbers of facilities, a person in charge of maintenance work of the facilities executes analytical process of calculating a rank of a degradation level (hereinafter, degradation rank) of each facility from raw data such as the specifications and inspection records of each facility stored in a database.
Standardization of analytical schemes has been in progress for each type of facility. However, a method of determining a value of a parameter such as a correction coefficient of a degradation level in consideration of reliability level of a facility included in a mathematical expression of analytical process has not been established yet. Thus, to determine an analytical scheme, the person in charge of maintenance work is required to adjust the parameter such as the correction coefficient so that analytical process is repeated by changing the parameter little by little without changing raw data.
Since the number of pieces of raw data is enormous, there is a problem in which it takes time if analytical process is re-executed on the whole raw data every time the parameter is changed.
To address this problem, Patent Literature 1 discloses a technology in which sets of parameters for which analytical process has been executed in the past and intermediate data as the analytical result are stored in advance and, when analytical process is executed with the parameter after change, if analytical process has been executed in the past with that parameter after change, the intermediate data corresponding to that parameter after change is referred to and analytical process is omitted.
Patent Literature 1: JP 4980395
In the conventional technology, unless parameters in a set do not completely match, analytical process is required to be re-executed for the entire raw data. Therefore, in the conventional technology, analytical process not executed in the past with the parameter after change is required to be re-executed. Thus, at this point, there is a problem in which it takes time.
An object of the present disclosure is to provide an apparatus which extracts raw data not requiring re-execution as much as possible with a small amount of calculation also for analytical process not executed in the past with a parameter after change.
A data analysis apparatus according to the present disclosure includes:
The data analysis apparatus according to the present disclosure uses summary information, and can therefore extract raw data not requiring re-execution as much as possible with a small amount of calculation also for analytical process not executed in the past with a parameter after change.
In the description of embodiments and the drawings, identical or corresponding components are provided with the same reference characters. Description of the components provided with the same reference characters is omitted or simplified as appropriate. In the following e nbodiments, “unit” may be read as “circuit”, “step ”, “procedure”, “process”, or “circuitry” as appropriate.
In the following embodiments, an application program is described as an application. Also in the following embodiments, analytical process means degradation rank calculation. In the following, in order to express that analytical process is degradation rank calculation, such an expression as analytical process (degradation rank calculation) may be made.
Embodiment 1 is described below with reference to
The data analysis apparatus 100 is as follows. The definition interpreting unit 110 receives an analytical process definition 201 from an analytical application 200 outside of the data analysis apparatus 100, and stores the analytical process definition 201 in the data storage unit 130. A change managing unit 122 described below receives an analysis instruction 202 from the analytical application 200. The analysis instruction 202 includes details of a change of a parameter of the analytical process definition 201.
The analytical processing unit 120 includes a summary generating unit 121, the change managing. unit 122, a group extracting unit 123, and an analysis executing unit 124. The group extracting unit 123 is an extracting unit. The analysis executing unit 124 is a calculating unit.
When the analysis instruction 202 received from the analytical application 200 includes an instruction for generating summary information described below, the summary generating unit 121 uses the analytical result of the analysis executing unit 124 and the information stored in the data storage unit 130 to generate summary information 121A of raw data. The summary generating unit 121 causes the generated summary information 121A to be stored in the data storage unit 130.
The change managing unit 122 receives the analysis instruction 202 from the analytical application 200. The analysis instruction 202 includes details of change of a parameter.
By using the summary information 121A stored in the data storage unit 130, the group extracting unit 123 extracts, from the summary information 121A, a group requiring re-execution of analytical process by the analysis executing unit 124. Also, by using the summaly information 121A, the group extracting unit 123 extracts a group not requiring re-execution of analytical process.
The analysis executing unit 124 executes analytical process on raw data of which the analysis executing unit 124 is notified by the group extracting unit 123 as requiring re-execution, with a parameter after change. The analysis executing unit 124 takes the result of analytical process as the final result. As for raw data of which the analysis executing unit 124 is not notified as requiring re-execution, the analysis executing unit 124 takes the result of calculation of a representative point of a group to which that raw data belongs as the final result.
The data storage unit 130 has stored therein (1) analytical process definition 201, (2) raw data group 131G, (3) analytical characteristic 110A, and (4) summary information 121A.
In
The data storage unit 130 has the analytical process definition 201 stored therein.
As illustrated in
y=g(x1, x2, p1)=x1+p1/x2.
y is a degradation level of a facility. x1 represents years of aging, and x2 represents reliability level. p1 is a parameter indicating a weight of the reliability level x2. f(g(x1, x2, p1)) indicates a rank of the degradation level of the facility. The degradation level g of the facility monotonically increases with respect to the years of aging x1 and monotonically decreases with respect to the reliability level x2. The function g is a mathematical expression taking numerical values x1 and x2 of each row of the raw data group 131G to of
The function f(y) is
f(y)=1(y<1), 2(1≤y<10), 3(10≤y).
That is, the degradation rank f(y) of the facility is determined stepwise in a manner such that the degradation rank is 1 when the degradation level g, as an interim result of analytical process, is smaller than 1, the degradation rank is 2 when the degradation level g is 1 or larger and smaller than 10, and the degradation rank is 3 when the degradation level g is 10 or larger.
The data storage unit 130 has the analytical characteristic 110A stored therein. The analytical characteristic 110A has registered thereon a list of information about stages (degradation ranks) for determining the final result of analytical process and interim results (degradation levels g) as inputs of the final results (degradation ranks f).
The data storage unit 130 has stored therein the summary information 121A described below in
In
In
Description is further specifically made. As illustrated in
g=x
1
+p
1
/x
2.
p1 is a positive number. Thus, g decreases with a decrease in the years of aging x1 and an increase in the reliability level x2. Also, g increases with an increase in the years of aging x1 and a decrease in the reliability level x2.
In
1=x1+0.9/x2
An upper side in this graph represents a region where the degradation level g is smaller than 1. A graph of an upper limit of degradation rank 2 (p1=0.9, degradation level g=10) is similar to a graph of a lower limit of degradation rank 2 (p1=0.9, degradation level g=1), and is a graph of (x1, x2) satisfying
10x1+0.9/x2.
The summary information is information about the final result of evaluation of an evaluation target determined stepwise from the interim result of evaluation of the evaluation target calculated by using raw data indicating an attribute of the evaluation target and a parameter.
The summary information is a set of a plurality of interim results with the same final result, and has, as representative data, each piece of raw data which is a source of calculation of two interim results among the plurality of interim results. Here, the evaluation target is a target for fiich a defined evaluation item is evaluated. In Embodiment 1, an example of the evaluation target is a facility, an example of the defined evaluation item is degradation. Also, an example of the interim result is a degradation level, and an example of the final result is a degradation rank. Specifically, the summary information is as follows.
The summary information 121A is information about a degradation rank of a facility determined from a degradation level, which is calculated by using raw data indicating an attribute of the facility and a parameter, of the facility. The summary information 121A is a set of a plurality of degradation levels with the same degradation rank, and information having, as representative data, each piece of raw data which is a source of calculation of two degradation levels among the plurality of degradation levels. In
As illustrated in
The operation of the data analysis apparatus 100 is described. In the drawings of flowcharts described below, what is indicated in parentheses at each step is a subject of operation. The operation procedure of the data analysis apparatus 100 is equivalent to a data analysis method. A program achieving the operation of the data analysis apparatus 100 is equivalent to a data analysis program 101.
The definition interpreting unit 110 receives the analytical process definition 201 from the analytical application 200.
The definition interpreting unit 110 interprets the received analytical process definition 201.
The definition interpreting unit 110 causes the interpreted analytical process definition 201 to be stored in the data storage unit 130. Also, the definition interpreting unit 110 determines whether the analytical characteristic 110A of the function described in
The change managing unit 122 receives, from the analytical application 200, the analysis instruction 202 including details of change of a parameter of the analytical process definition 201. The change managing unit 122 extracts the details of change of the parameter from the analysis instruction 202, and notifies the summary generating unit 121 of it.
When the parameter p1 is changed, by using the summary information 121A, the group extracting unit 123 extracts a facility requiring recalculation of the degradation level g from respective facilities corresponding to the plurality of degradation levels g included in the summary infommtion 121A. When a degradation level of the facility is calculated by the analysis executing unit 124, the summary generating unit 121 generates new summary information different from the summary information 121A used at step S202. The summary generating unit 121 generates summary information for each of the plurality of degradation ranks. By using each summary information generated for each degradation rank, the group extracting unit 123 extracts a facility requiring recalculation of a degradation level.
Description is specifically made below.
With the use of the summary information 121A stored in the data storage unit 130, the group extracting unit 123 extracts a group not requiring re-execution of analytical process (degradation rank calculation) from the summary information 121A, and calculates the analytical result for the representative point of each group with the parameter p1 after change. Details of the present step are described further below.
For the facility extracted by the group extracting unit 123, the analysis executing unit 124 recalculates its degradation level by using the raw data of the extracted facility and the parameter after change. Description is specifically made below.
The analysis executing unit 124 executes analytical process (degradation rank calculation) on raw data (at least any row in
The summary generating unit 121 determines whether an instruction for generating the summary information 121A is included in the analysis instruction 202 from the analytical application 200. When the determination result is YES, the summary generating unit 121 corrects the summary information 121A used at step S202 by using the analytical result of the analysis executing unit 124 and the information stored in the data storage unit 130, and causes the corrected summary information 121A to be stored in a data storage unit 130. That is, the summary generating unit 121 corrects the summary information 121A so that the summary information 121A corresponds to the parameter p1 after change. The correction of the summary information 121A includes re-generation of the summary information 121A. Details of the present step S204 are described further below in
The group extracting unit 123 refers to the data storage unit 130 to determine whether the summary information 121A has been registered. When the determination result is YES, the process proceeds to step S302. When the determination result is NO, the process proceeds to step S304.
The group extracting unit 123 determines whether the analytical process definition 201 in the data storage unit 130 includes the analytical characteristic 110A for the summary information 121A to become usable, with reference to the analytical characteristic 110A and the analytical process definition 201. When the determination result is YES, the process proceeds to step S303. When the determination result is NO, the process proceeds to step S304.
The summary information 121A is taken as a set of one or more interim results (degradation levels). By using representative data and the parameter after change, the group extracting unit 123 calculates one or more interim results (degradation levels) of evaluation targets (facilities) having the representative data. Then, the group extracting unit 123 determines the final results (degradation ranks) of the calculated one or more interim results (degradation levels), and determines whether the determined one or more final results (degradation ranks) match. When the one or more final results (degradation ranks) match, the group extracting unit 123 extracts each of the evaluation targets (facilities) corresponding to the one or more interim results (degradation levels) included in the summary information 121A as an evaluation target (facility) not requiring recalculation of an interim result. Description is specifically made below.
The group extracting unit 123 determines whether re-execution is required for each group included in the summary information 121A (
In a graph of a lower limit (degradation level=1 when p2=0.6) of degradation rank 2 of
In
The group extracting unit 123 determines that re-execution of degradation rank calculation is required for the whole raw data, and step S202 ends.
With reference to
Regarding analytical process executed last, the summary generating unit 121 organizes a plurality of pieces of raw data with the same degradation rank as the final result into one group.
(Step S402: Division into Rectangles)
The summary generating unit 121 divides the group divided at step S401 into a plurality of rectangular regions, and organizes raw data belonging to the inside of the same rectangle into one group and finds a representative point of the group. Details of step S402 are described further below in
For the group of rectangles found at step S402, the summary generating unit 121 finds a list of representative points and raw data belonging to that group, and causes the list to be stored in the data storage unit 130 as the summary information 121A. The summary information 121A of
The summary generating unit 121 generates the summary information according to either proximity between the interim result of the evaluation target and a lower limit of a stage of determining the final result of evaluation of the evaluation target or proximity between the interim result of the evaluation target and an upper limit of the stage of determining the final result of evaluation of the evaluation target. Specifically, description is made as follows.
The summary generating unit 121 selects a plane (straight line) perpendicular to a certain axis so that one group with the same degradation rank f is divided into a small group including a point where the degradation level g of the interim result is in proximity to the lower limit (degradation level g=1) or the upper limit (degradation level g=10) of the degradation rank f and a large group of others. Here, as for a lower limit 1 or an upper limit 3 of the degradation rank f, in the case of the analytical process definition 201 illustrated in
The summary generating unit 121 divides a small group of raw data with L=3 into a plurality of rectangles. Details of this step S502 is described further below.
The summary generating unit 121 determines whether the number of pieces of raw data in a large group is equal to or smaller than L. When the determination result is YES, the process proceeds to step S504. When the determination result is NO, the process proceeds to step S505.
The summary generating unit 121 divides a large group with the number of pieces of raw data equal to or smaller than L into rectangles by a method similar to that in step S502. Step S402 ends.
The summary generating unit 121 sets a large group as a new group, and returns the process to step S501. With the above, one group belonging to one degradation rank as illustrated in
The summary generating unit 121 selects the smallest rectangle (hereinafter, smallest circumscribed rectangle) so that rectangle includes a small group formed of pieces of raw data the number of which is equal to or smaller than L.
As described in
By using monotonicity of the interim result, the summary generating unit 121 acquires a minimum point and a maximum point of the degradation level g in the smallest circumscribed rectangle from end points of the smallest circumscribed rectangle. On the left side in
For the end point 11 of the minimum point of the degradation level g and the end point 13 of the maximum point of the degradation level g found at step S602, the summary generating unit 121 calculates the final results (degradation ranks), and determines whether the final results of the degradation ranks of the minimum point 11 and the maximum point 13 by using the new parameter p1 match. When the final results of the degradation ranks of the minimum point 11 and the maximum point 13 match, step S502 ends. When the final results of the degradation ranks of the minimum point 11 and the maximum point 13 do not match, the process proceeds to step S604. On the left side in
As the right side in
In Embodiment 1, it is often the case that, for most raw data, the final results do not change even if a parameter is changed. The reason for this is that the final result (degradation rank) is determined stepwise and the interim result (degradation level g) is often changed only slightly even if a parameter is changed. The interim result is often changed only slightly even if a parameter is changed because of the following. In analysis of the degradation rank of a facility, the degradation level has continuity with respect to the parameter, and the change range of the parameter is often small. Continuity is a property in which when the change range of the parameter is small, the change ranee of the interim result is also small. In addition, since the interim result (degradation level g) has monotonicity, the upper limit and the lower limit of the analytical results (degradation ranks) of the entire raw data in each rectangle can be grasped from the analytical results of the minimum points and the maximum points of the rectangle.
Since “the number of rectangles is smaller than the number of pieces of raw data”, raw data requiring re-execution with the parameter after change can be extracted with a small amount of calculation. With this, it is possible to obtain an effect of being able to reduce analytical process time even if the changed parameter has not been calculated in the past.
In Embodiment 1, as illustrated in the operation at step S402, by following proximity between the interim result (degradation level g) in the analytical process executed last and the lower limit (g=1) or upper limit (g=10) of the stage (degradation rank), the entire raw data is divided into rectangles. This decreases the number of rectangles including a point in which the interim result (degradation level g) is in proximity to the lower limit or upper limit of the stage (degradation rank).
The point in which the interim result (degradation level) is in proximity to the lower limit or upper limit of the stage (degradation rank) has a high possibility that the final result changes with parameter change. Thus, with the operation at step S402, it is possible to obtain an effect of reducing the number of rectangles requiring re-execution at the time of parameter change.
In Embodiment 2, points different from Embodiment 1 or points added thereto are mainly described. In the present embodiment, a basic screen processing method of the information search method of the data analysis apparatus 100 described in Embodiment 1 is described in detail. Functions and structures similar to those of the data analysis apparatus 100 of Embodiment 1 are provided with the same reference characters and description of these functions and structures are omitted.
In Embodiment 2, the change of the interim result of the same data item of the plurality of pieces of raw data indicates a value equal to or smaller than a constant value when the change of the numerical value data of the same data item of the plurality of pieces of raw data has a value equal to or smaller than a constant value. The summary generating unit 121 of Embodiment 2 divides the plurality of pieces of raw data into a plurality of regions, and sets a center point of each region as representative data. An example of the region is a rectangle, as with Embodiment 1. As the center point of the region, the barycenter of a figure representing the region can be used. Specifically, description is made as follows.
In Embodiment 2, as the analytical characteristic 110A, in place of monotonicity of the interim result g in Embodiment 1, a property is utilized in which, when the change of raw data is equal to or smaller than a constant value (hereinafter, C value), the change of the interim result g is also equal to or smaller than a constant value (hereinafter, D value).
This property is specifically represented in a mathematical expression as the following <Mathematical Expression 1>.
Certain positive real numbers C and D are present, and for a set of any point (a1, a2, . . . , aM) and a parameter (q1, . . . , qN),
|g(x1, x2, . . . , xM, p1, . . . , pN)−g(a1, a2, . . . , aM, q1, . . . , qN)|≤D
(each of (x1, x2, . . . , xM) and (p1, . . . , pN) is any point satisfying
max{|xi−ai||i=1, 2, . . . , M}≤C, max{|pi−qi||i=1, 2, . . . , M}≤C) <Mathematical Expression 1>
mathematical expressions
max{|xi−ai||i=1, 2, . . . , M}≤C,
max{|pi−qi||i=1, 2, . . . , M}≤C,
mean that a change of raw data is equal to or smaller than C.
a mathematical expression
|g(x1, x2, . . . , xM, p1, . . . , pN)−g(a1, a2, . . . , aM, q1, . . . , qN)|≤D
means that a change of the interim result g is equal to or smaller than D.
In Embodiment 2, as the analytical characteristic 110A, the above-described property of Mathematical Expression 1 regarding the interim result and the above-described C value and D value are stored. In Embodiment 2, as with Embodiment 1, the entire raw data is divided into rectangles, but a width from the center of each rectangle is set to he equal to or smaller than C.
In <Mathematical Expression 1> described above, the rectangle is a set {(x1, x2, . . . , xM)|max{|xi-ai||i=1, 2, . . . , M}≤C}, and the center of the rectangle is a point (a1, a2, . . . , aM).
Furthermore, as the representative point of the group, the center of the rectangle is stored as one piece of information in the summary information 121A.
In Embodiment 2, when whether re-execution is required for each group is determined, the interim result is calculated with the parameter after change and, by using Mathematical Expression 1 described above, it is determined whether the final results of the entire data in the rectangle become the same.
From Embodiment 2, even if the interim result does not have monotonicity, much raw data not requiring re-execution can be extracted with a small amount of calculation.
The hardware structure of the data analysis apparatus 100 of Embodiments 1 and 2 is described with reference to
The data analysis apparatus 100 is a computer. The data analysis apparatus 100 includes a processor 310. The data analysis apparatus 100 includes, in addition to the processor 310, other hardware such as a main storage device 320, an auxiliary storage device 330, an input IF 340, an output IF 350, and a communication IF 360. The processor 310 is connected via a signal line 370 to the other hardware to control the other hardware. IF represents an interface.
The data analysis apparatus 100 includes the definition interpreting unit 110 and the analytical processing unit 120 as functional components. The analytical processing unit 120 includes the summary generating unit 121, the change managing unit 122. the group extracting unit 123, and the analysis executing unit 124. The functions of the definition interpreting unit 110 and the analytical processing unit 120 are implemented by the data analysis program 101.
The processor 310 is a device which executes the data analysis program 101. The data analysis program 101 is a program which achieves the functions of the definition interpreting unit 110 and the analytical processing unit 120. The processor 310 is an IC (Integrated Circuit) which performs arithmetic processing. Specific examples of the processor 310 are a CPU (Central Processing Unit), DSP (Digital Signal Processor), and GPU (Graphics Processing Unit). The processor 310 is included in circuitry.
The main storage device 320 is a storage device. Specific examples of the main storage device 320 are an SRAM (Static Random Access Memory) and DRAM (Dynamic Random Access Memory). The main storage device 320 retains the arithmetic operation result of the processor 310. The data storage unit 130 is implemented by the main storage device 320.
The auxiliary storage device 330 is a storage device which nonvoluntarily stores data. A specific example of the auxiliary storage device 330 is an HDD (Hard Disk Drive). Also, the auxiliary storage device 330 may be a portable recording medium such as an SD (registered trademark) (Secure Digital) memory card, NAND flash, flexible disk, optical disk, compact disk, Blu-ray (registered trademark) disc, or DVD (Digital Versatile Disk). The data storage unit 130 is implemented by the auxiliary storage device 330. The auxiliary storage device 330 has the data analysis program 101 stored therein.
The input IF 340 is a port to which data is inputted from each device. The output IF 350 is a port to which various devices are connected and from which data is outputted by the processor 310 to various devices. The communication IF 360 is a communication port for the processor 310 to communicate with another device. To the communication IF 360, the analytical application 200 is connected.
The processor 310 loads the data analysis program 101 from the auxiliary storage device 330 into the main storage device 320, and reads and executes the data analysis program 101 from the main storage device 320. In the main storage device 320, not only the data analysis program 101 but also an OS (Operating System) are stored. While executing the OS, the processor 310 executes the data analysis program 101. The data analysis apparatus 100 may include a plurality of processors replacing the processor 310. The plurality of these processors share execution of the data analysis program 101. Each of the processors is, as with the processor 310, a device which executes the data analysis program 101. Data, information, signal values, and variable values to be used, processed, or outputted by the data analysis program 101 are stored in the main storage device 320, the auxiliary storage device 330, or a register or cache memory in the processor 310.
The data analysis program 101 is a program which causes a computer to execute each process, each procedure, or each step obtained by reading “unit” of the definition interpreting unit 110 and the analytical processing mit 120 as “process”, “procedure”, or “step”.
Also, the data analysis method is a method to be performed by the data analysis apparatus 100 as a computer executing the data analysis program 101. The data analysis program 101 may be provided as stored in a computer-readable recording medium or may be provided as a program product.
11, 12, 13: point; 100: data analysis apparatus; 110: definition interpreting unit; 110A: analytical characteristic; 120: analytical processing unit; 121: summary generating unit; 121A: summary infommtion; 122: change managing unit; 123: group extracting unit; 124: analysis executing unit; 130: data storage unit; 131G: raw data group; 200: analytical application; 201: analytical process definition; 202: analysis instruction; 310: processor; 320: main storage device; 330: auxiliary storage device; 340: input IF; 350: output IF; 360: communication IF
This application is a Continuation of PCT International Application No. PCT/JP2021/025218, filed on Jul. 2, 2021, which is hereby expressly incorporated by reference into the present application.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2021/025218 | Jul 2021 | US |
Child | 18522475 | US |