Embodiments of the present invention relate to an information-processing device, an information-processing method, and a program.
In recent years, the problem of aging has been raised as a major issue for social infrastructure systems. For example, in electric power systems, the deterioration of substation facilities has been increasing worldwide, and it is important to formulate a facility investment plan. In order to solve such facility investment planning problems, solutions have been developed by experts in each field. In doing so, it is desirable to meet requirements such as scalability that can handle large scale, diversity that can set various facility types that make up the system, and variability that can flexibly respond to changes in facility configuration. However, it has been difficult to simultaneously satisfy these three conditions.
The problem to be solved by the present invention is to provide an information-processing device, an information-processing method, and a program that can formulate a change proposal for social infrastructure.
The information-processing device of the embodiment has a generation unit and a formulation unit. The generation unit generates a facility change proposal candidate using a policy function, which is a probability model for facility changes in a system having a graph structure. The formulation unit evaluates the reliability of the system for each facility change proposal candidate generated by the generation unit.
An information-processing device, an information-processing method, and a program according to embodiments will be described below with reference to the drawings. Hereinafter, in the following description, a facility change proposal will be described as an example of processing handled by the information-processing device. It should be noted that this embodiment is not limited to solving the facility change planning problem for social infrastructure systems. Also, in the following example, a power grid system will be described as an example of a social infrastructure system, but the social infrastructure system is not limited to this. The social infrastructure systems may be systems such as water supply, gas, roads, and communications.
First, a configuration example of an information-processing device 1 will be described.
The formulation unit 10 includes an evaluation unit 101 and an output unit 102.
The generation unit 20 includes an environment unit 201, a policy function unit 202, and a sampling unit 203.
The evaluation unit 101 evaluates the reliability and prepares a correction change proposal. The evaluation unit 101 outputs the created correction change proposal to the environment unit 201. Moreover, the evaluation unit 101 outputs the revision change proposal to the output unit 102 when the revision changes of the revision change proposal converge.
The output unit 102 outputs the revision change proposal output by the evaluation unit 101 to an external device (for example, a display device 3).
The environment unit 201 is, for example, a target system, a model of the target system, a simulator, and the like. The environment unit 201 acquires the revision change proposal output by the evaluation unit 101, inputs the acquired revision change proposal into, for example, a model of the target system, and generates a system state (φk) at time k. The environment unit 201 outputs the generated system state (φk) to the policy function unit 202. Further, the environment unit 201 acquires a change proposal ak output by the sampling unit 203, inputs the acquired change proposal ak to, for example, the model of the target system, and generates a system state (φk+1) at k+1 time. The environment unit 201 outputs a generated k+1-th system state (φk+1) to the formulation unit 10.
The policy function unit 202 stores a policy function that is a probability model. The policy function unit 202 inputs the system state output by the environment unit 201 to the policy function and obtains the probability distribution of action selection for facility change correction. The policy function unit 202 outputs the obtained probability distribution of action selection to the sampling unit 203.
The sampling unit 203 acquires the probability distribution of action selection output by the policy function unit 202. The sampling unit 203 samples the change proposal ak based on the probability distribution and outputs the sampled change proposal ak to the environment unit 201. The policy function is a function that associates the probability of being selected with each action option in the state ek. The option is determined according to this probability. In sampling, for example, a real number from 0 to 1 is divided into line segments having a length corresponding to the probability value of each option, a line segment number (index) is attached, and depending on which section the random number generated by the uniform random number function from 0 to 1 is in, the option is selected with the corresponding line segment number.
Next, the policy function will be described with reference to
As shown in
In addition, in the policy function, as shown in
Here, an overview of the processing in this embodiment will be described.
In this embodiment, a system average interrupt frequency index (SAIFI), which is the occurrence rate of power outages, is evaluated as reliability while formulating a facility change proposal using a policy function. The SAIFI value is an international power system supply reliability index, and is an index obtained by the formula I (power outage load number)×(power outage occurrence rate)/(total load number). Therefore, a smaller SAIFI value indicates a highly reliable system with fewer power outages. In this embodiment, the state calculation result of the physical simulator such as the power flow is reflected in the evaluation, and the failure rate depending on the power flow can also be considered. The power flow-dependent failure rate is calculated by, for example, a power flow simulator. The tidal current simulator may be an external device, and may be included in the environment unit 201, for example.
In addition, in this embodiment, a facility change proposal candidate that improves SAIFI is selected as a facility change proposal. It should be noted that the facility change proposal is assumed to be facility change during a predetermined change period. In each of the following embodiments, an example using SAIFI as an example of reliability will be described, but the reliability may be any one according to the social infrastructure system. For example, if the social infrastructure system is communication, the reliability may be the disconnection rate of the communication network. In the case of social infrastructure system roads, the reliability may be the impassability rate of roads. Moreover, the reliability to be used is not limited to one type, and two or more types may be used. In addition, in reinforcement learning, as will be described later, learning may be performed with an emphasis on the cumulative facility investment cost of the system. Thus, according to the embodiment, it is possible to formulate a facility change proposal while achieving both cumulative cost and reliability.
It is assumed that the facility change proposal of this embodiment is facility change during a predetermined change period. In this embodiment, a plan with high reliability is formulated while evaluating SAIFI in the process of formulating a draft facility change proposal. It is assumed that the condition can define SAIFI for the circuit diagram.
In symbols g11 and g12 in
In this embodiment, the SAIFI of this circuit is calculated using the failure rates λ1 to λ4. Since circuits and metagraphs correspond, SAIFI is determined corresponding to this metagraph. Here, if the circuit configuration is changed at timings of time 1, time 2, . . . , time T, the metagraph sequence Φ=(φ1, φ2, . . . , φT) is determined correspondingly. The metagraph changes in chronological order depending on the configuration and state of the facility.
Next, SAIFI will be described.
SAIFI calculation is performed by obtaining the power supply outage probability per load for a given system configuration and each facility failure probability, as shown in
Here, an example of the SAIFI value for each country in the power system will be described.
For example, the index (0 to 3) for the frequency of annual power outages in Country A is 1, and the average number of power outages index (SAIFI value) is 8.2. As for the index related to the frequency of annual power outages, the higher the score, the lower the number of power outages and the shorter the duration. In addition, the index for the frequency of annual power outages in country B is 2, and the SATFI value is 0.6. In addition, the index for the frequency of annual power outages in country C is 3, and the SAIFI value is 0.0. Thus, since the SATFI value differs depending on the country or region, the standard value or threshold is also set according to the country or region.
Next, the procedure for formulating a facility change proposal will be described. In this example, in the change proposal, a policy will be described in which the SATFI value of the system is always better, that is, smaller, than the preset threshold SAIFI_th. The setting of this threshold is, for example, the upper limit of the power outage occurrence rate set as the quality of the power supply service.
The formulation unit 10 and the generation unit 20 acquire a system state p (initial state) to be evaluated. The generation unit 20 acquires policy functions and environmental conditions. The policy function is obtained by, for example, reinforcement learning. Also, the initial state φ0 may have the same configuration as that of the trained neural network. In addition, the environmental conditions are, for example, the specifications of the facility that constitute the system, the characteristic model (cost model, etc.), the external environment of the system related to planning of facility change such as demand patterns (predicted values are acceptable) and power generation patterns in the case of electric power systems. Subsequently, the formulation unit 10 obtains and stores SATFI (φ0) corresponding to the initial state φ0.
The formulation unit 10 and the generating unit 20 repeat the processes of steps S12 to S17 T times to change and correct the facility change proposal.
The formulation unit 10 formulates a revision change proposal. The formulation unit 10 formulates a correction change proposal based on the initial state φ0 in the first process, and formulates a correction change proposal based on the system state φ1(=0+1) in the second process. In this way, in the k-th process, the generation unit 20 inputs the revision/change proposal formulated by the formulation unit 10 to the environment unit 201 to obtain the system state φk+1. Subsequently, the generation unit 20 inputs the system state φk+1 to the policy function unit 202 to obtain a probability distribution. Subsequently, the generating unit 20 obtains the change proposal ak by sampling based on the obtained probability distribution. The generation unit 20 inputs the change proposal ak to the environment unit 201 to obtain the system state φk+1.
The formulation unit 10 obtains SAIFI(φk) corresponding to the system state φk. The formulation unit 10 obtains SAIFI (φ0) corresponding to the initial state φ0 in the first processing, and SAIFI (φ1) corresponding to the system state φ1 (=0+1) in the second processing.
The formulation unit 10 compares the preset threshold SAIFI_th with SAIFI(φk) obtained in step S13, and determines whether SAIFI(φk) has been improved from the threshold SAIFI_th. For example, the formulation unit 10 determines that the improvement is achieved when SAIFI(φk) is equal to or less than the threshold SAIFI_th or when the ratio of SAIFI(φk) to the threshold SAIFI_th is 1 or less. If the formulation unit 10 determines that the improvement has been made (step S14; YES), the process proceeds to step S16. If the formulation unit 10 determines that there is no improvement (step S14; NO), the process proceeds to step S15.
Since it is determined that the improvement has not been made, the formulation unit 10 formulates improvement measures for SAIFI(Φ) of the facility change proposal candidate. Specifically, the formulation unit 10 reflects the SAIFI improvement update Δφ in the system state φk. In this way, if no improvement has been made, the system state φk is replaced in the k-th process. Subsequently, the formulation unit 10 substitutes k+1 for k, and returns to the process of step S12. The SAIFI improvement update Δφ will be described in the first embodiment.
The formulation unit 10 determines whether or not the processes of steps S12 to S16 have been repeated T times. When determining that the processes of steps S12 to S16 have been repeated T times (step S16; YES), the formulation unit 10 proceeds to the process of step S18. When determining that the processes of steps S12 to S16 have not been repeated T times (step S16; NO), the formulation unit 10 proceeds to the process of step S17.
The formulation unit 10 substitutes k+1 for k, and returns to the process of step S12.
The formulation unit 10 outputs the sequence Φ=(φ1, φ2, . . . , φT) as a facility change proposal.
It should be noted that if there is no improvement even after performing the processing of steps S12 to S16 T times, the formulation unit 10 may output information indicating that the improvement was not made to the generation unit 20. In such a case, for example, the user may input other conditions to the formulation unit 10 and the generation unit 20, and the formulation unit 10 may formulate a correction change proposal again based on the given other conditions. Alternatively, in such a case, the generation unit 20 may formulate a plan not to change.
Here, a specific processing example will be described for the processing up to the second round.
The formulation unit 10 first outputs the initial state as a correction change proposal. Further, the formulation unit 10 obtains and stores SAIFI (φ0) corresponding to the initial state φ0.
Next, the generation unit 20 inputs the system state φ0 to the policy function unit 202 and obtains the probability distribution of the next (first) action selection. Subsequently, the generation unit 20 obtains a change proposal a1 by sampling from the probability distribution of action selection. Subsequently, the generation unit 20 inputs the change proposal a1 to the environment unit 201 to obtain the next system state φ1. The generation unit 20 outputs the determined next system state φ1 to the formulation unit 10.
The formulation unit 10 obtains SAIFI (φ1) corresponding to the system state φ1. Next, the formulation unit 10 compares the threshold SAIFI_th and SAIFI(Y′) to determine whether or not there is an improvement. If it is determined that there is no improvement, the formulation unit 10 reflects the SAIFI improvement update Δφ in the system state φ1. Subsequently, the formulation unit 10 formulates a correction change proposal based on the reflected SAIFI improvement update Δφ.
Next, the generation unit 20 inputs the revision change proposal to the environment unit 201 and obtains the k=1st system state φ′1. The reason the system state is φ′k+1 instead of φk+1 is that the SAIFI improvement update Δφ is reflected.
Subsequently, the generation unit 20 inputs the system state φ′1 to the policy function unit 202 and obtains the probability distribution of the next (k=2 (=k+1)th) action selection. Subsequently, the generation unit 20 obtains a change proposal a2 by sampling from the probability distribution of action selection. Subsequently, the generation unit 20 inputs the change proposal a2 to the environment unit 201 and obtains the second system state φ2(=1+1). The generation unit 20 outputs the determined second system state φ2 to the formulation unit 10.
The formulation unit 10 obtains SAIFI (φ2) corresponding to the system state φ2. Next, the formulation unit 10 compares the threshold SAIFI_th and SAIFI(φ2) to determine whether or not there is an improvement. If it is determined that there is no improvement, the formulation unit 10 reflects the SAIFI improvement update Δφ in the system state φ2. Subsequently, the formulation unit 10 formulates a correction change proposal based on the reflected SAIFI improvement update Δφ.
Here, an example of SAIFI improvement update Δφ will be described.
Thus, in this embodiment, if the SAIFI value of the facility change proposal candidate selected based on the policy function becomes worse than the preset threshold SAIFI_th, the selected facility change proposal is corrected.
The SAIFI improvement update Δφ in step S15 in
Also, there can be more than one type of improvement proposal. When setting according to allowable conditions, or when SAIFI deteriorates due to the modified update proposal φk, it is considered that the deletion of facilities corresponding to the proposal or the deterioration of specifications (deterioration of reliability) causes the deterioration of the SAIFI value. Therefore, the information-processing device 1 may select an option of not adopting (not accepting) the modified update proposal φk, that is, replacing φk without change. In
As described above, in this embodiment, a policy function is used. In addition, in this embodiment, SAIFI is evaluated while preparing a facility plan. Furthermore, in the present embodiment, a plan is formulated taking into consideration the facility change for improving SAIFI as a condition. Specifically, during the planning (inference) of the plan change proposal, if there is no improvement, a good condition addition (Δφ) of the SAIFI value within the SATFI allowable range is added.
As a result, according to this embodiment, it is possible to formulate a facility change proposal that satisfies the cost of facility change to some extent and also satisfies the conditions of SAIFI.
In the embodiment, a constraint may be added to the policy function for a revised update proposal that degrades SAIFI as a SAIFI improvement update policy. In the present embodiment, an example will be described in which the conditions for formulating a facility change proposal are the conditions for the policy function. In this embodiment, for example, by setting the output probability of the policy function for the revised update proposal to 0, the revised update proposal will not occur in the future.
First, a configuration example of the information-processing device 1A will be described.
The formulation unit 10A includes an evaluation unit 101A and an output unit 102.
The generation unit 20A includes an environment unit 201, a policy function unit 202A, and a sampling unit 203.
The functional units that operate in the same manner as the information-processing device 1 are denoted by the same reference numerals, and descriptions thereof are omitted.
In addition to the operation of the evaluation unit 101, the evaluation unit 101A generates constraints on the policy function. The evaluation unit 101A outputs the generated constraint to the policy function unit 202A. The restriction is, for example, by setting the output probability of the policy function for the revised update proposal to 0, so that the revised update proposal will not occur in the future.
The policy function unit 202A reflects the constraint output by the evaluation unit 101A, inputs the system state output by the environment unit 201 into the policy function, and obtains the probability distribution of action selection for facility change correction. The policy function unit 202A outputs the obtained probability distribution of action selection to the sampling unit 203.
Next, the constraints of the policy function used in this embodiment will be described.
In this embodiment, as in the first embodiment, processing is performed T times to select a facility change proposal. If the policy function is defined as the occurrence probability of the action candidate for the next k-th processing in the state φk−1 of the system at the k−1th time, the occurrence from the probability distribution π(·) is given by the following equation (1).
[Number 1]
a
k˜π(·|ϕk−1) (1)
Here, adding restrictions to the occurrence of the updated action plan ak in the action space can be considered as follows. Since the occurrence of the update action plan ak of the k-th process as a function of the state φk−1 at the k−1th process causes the SAIFI degradation, the action for a specific state (φk−1) of the system is constrained at this k−1th process, and no constraints are imposed on different states.
Next, the procedure for formulating a facility change proposal will be described.
The formulation unit 10A and the generation unit 20A perform steps S11 to S14 in the same manner as in the embodiment. If the formulation unit 10A determines that an improvement has been made (step S14; YES), the process proceeds to step S16. If the formulation unit 10A determines that there is no improvement (step S14; NO), the process proceeds to step S21. The generation unit 20A generates each system state.
If it is determined that an improvement has not been made, the formulation unit 10A formulates an improvement plan for SAIFI(Φ) of the facility change proposal candidate by adding constraints to the policy function. Specifically, the formulation unit 10A adds a constraint on the policy function and reflects the SAIFI improvement update Δφ in the revised update proposal φk. That is, since the policy function is the probability of causing an action to occur, by adding a constraint to the value or setting the probability distribution for that action to 0, the candidate is prevented from occurring, that is, not selected. Subsequently, the formulation unit 10A substitutes k+1 for k, and returns to the process of step S12.
As described above, in the present embodiment, during the formulation (inference) of the plan change proposal, if there is no improvement, a constraint is added to the policy function to set a good condition (Δφ) for the SAIFI value within the SAIFI allowable range.
As a result, according to the present embodiment, for example, by setting the output probability of the policy function for the modified update proposal to 0, the modified update proposal is prevented from occurring in the future according to the system state. Thereby, it is possible to formulate a facility change proposal more efficiently than the embodiment.
In this embodiment, an example will be described in which the SAIFI of each facility change proposal candidate is first calculated, and the plan is formulated by limiting to the facility change proposal candidates that satisfy the conditions for the SAIFI. Limiting to facility change proposal candidates as in the method shown in this embodiment is an extension of the method of adding constraints to the policy function performed in the second embodiment. In other words, it is a method that can consider the balance between the two indexes of minimizing the cumulative investment cost and securing the reliability, for example, the basic policy for policy function generation. The action decision process at an arbitrary time k is basically composed of two processes. That is, the first process (i) is the process of calculating each SAIFI value for each action candidate at time k, and the second process (ii) is the process of imposing a constraint on the policy function based on the SAIFI value and sampling from the constrained policy function to determine the action plan.
First, a configuration example of the information-processing device 1B will be described.
The formulation unit 10B includes an evaluation unit 101B and an output unit 102. The evaluation unit 101B includes a SAIFI function unit 1011 and a list unit 1012.
The generation unit 20B includes an environment unit 201B, a policy function unit 202B, a sampling unit 203, and a candidate proposal list unit 204.
The functional units that operate in the same manner as the information-processing device 1 are denoted by the same reference numerals, and descriptions thereof are omitted.
The operation and processing of each functional unit of the information-processing device 1B will be described below. In the following description, it is assumed that the state of the system at arbitrary time k is φk.
The generation unit 20B extracts all candidates for the next state for that state. The policy function unit 202B generates next action candidates for the state φk. This is because the selection probabilities for action candidates are defined, so the policy function unit 202B extracts action candidates {aki, ak2, . . . , akmk} (mk is the number of candidates) with a probability greater than 0, and sends them to the environment unit 201B. The environment unit 201B obtains the state φk+1i for each action candidate aki. The environment unit 201B outputs the obtained state φk+1 for each action candidate aki to the evaluation unit 101B (i).
The evaluation unit 101B evaluates the reliability and creates a correction change proposal. The evaluation unit 101B evaluates the SAIFI value for each configuration plan of the correction update proposal candidates in the SAIFI function unit 1011 in advance before generating the correction update proposal in the process at the time k from the policy function. When the evaluation unit 101B determines a modified update proposal, it selects from a set of proposals whose SATFI values satisfy the criteria. The evaluation unit 101B creates and stores the correction/change information of the policy function in the list unit 1012, and outputs it to the policy function unit 202B. In addition, the evaluation unit 101B outputs the revision change proposal to the output unit 102 when the revision changes of the revision change proposal converge.
The SAIFI function unit 1011 stores SATFI functions. The SAIFI function unit 1011 obtains the next system state φk+1 output by the environment unit 201B, and inputs the obtained system state φk+1 to the SAIFI function to obtain SAIFI(φk+1). The SATFI function unit 1011 also calculates the SAIFI sequence {(aki, φk+1, SAIFI(φk+1i))} (i=1, 2, . . . , mk) according to the action candidate aki and the state φk+1i caused in the process at time k. Here, mk is the number of action candidates for the k-th step.
The list unit 1012 stores SAIFI sequence {(aki, φk+1i, SAIFI(φk+1i))} (i=1, 2, . . . , mk). The list unit 1012 acquires SAIFI (φk+1) output from the SAIFI function unit 1011. The list unit 1012 outputs candidate list information {(aki, ek+1i, SAIFI(φk+1i))} (i=1, 2, . . . , mk) indicating the acquired candidate list to the policy function unit 202B. As a result, the list unit 1012 imposes a constraint on the generation of the policy function on the policy function unit 202B.
In
Based on the modified policy function (ii), the environment unit 201B acquires the proposed change action (ii) at time k output by the sampling unit 203, and generates the system state (φk+1) at time k+1. The environment unit 201 outputs the generated system state (φk+1) at time t+1 to the evaluation unit 101B (ii). The SAIFI function unit 1011 obtains SAIFI(φk+1) and outputs it via the output unit 102. Alternatively, when the SAIFI sequence {(aki, φk+1i, SAIFI(φk+1i))} (i=1, 2, . . . , mk) is already stored in the list unit 1012, the environment unit 201B may directly input the action ak information selected by the sampling unit 203 to the formulation unit 10B. In this case, the evaluation unit 101B may refer to the corresponding state φk+1 and the corresponding SAIFI(φk+1) value from the list unit 1012 and output them.
The policy function unit 202B stores policy functions. The policy function unit 202B imposes restrictions on the generation of the policy function by restricting the options of the revision change proposal based on the candidate list information output by the list unit 1012. The policy function unit 202B inputs the system state output by the environment unit 201B to the policy function and obtains the probability distribution of action selection for facility change correction. The policy function unit 202B outputs the obtained probability distribution of action selection to the sampling unit 203.
The constraint on the policy function shown in Example 1 is the rule that when the state φk+1 caused by the action ak does not satisfy the SAIFI condition, that is, when SAIFI(φk+1) is greater than SAIFI_th, the action is not selected. This means that other action candidates are selected by the probability distribution ratio of the original policy function, which is different from the SAIFI evaluation that was the basis of the policy function, and for example, is an action selection based on cumulative cost minimization.
On the other hand, when the SAIFI value for each action candidate is obtained as in this embodiment, it is possible to easily select actions with good SAIFI values, that is, actions with small values. This corresponds to action selection that emphasizes reliability. Alternatively, if the probability is set by the ratio of the quotient (division) of the selection probability expressed by the policy function and the SAIFI value, an action with a good balance between reliability improvement and cost minimization will be selected. These evaluations may be added to the list unit of the evaluation unit 101B. The policy function unit 202B is constrained by the SAIFI evaluation in the evaluation unit 101B.
A procedure for formulating a facility change proposal in this embodiment will be described.
The formulation unit 10B acquires the system state φ0 (initial state) to be evaluated. The generation unit 20B acquires policy functions and environmental conditions.
The formulation unit 10B and the generation unit 20B repeat the processing of steps S32 to S35 T times to formulate a facility change proposal. The generation unit 20B generates each system state.
The formulation unit 10B evaluates in advance the SAIFI value (φki) for each configuration plan of the correction update proposal candidates {φki} before generating the correction update proposal from the policy function.
The formulation unit 10B preliminarily restricts only update proposals that satisfy the SAIFI condition based on the SAIFI value (φki) for each configuration plan of the modified update proposal candidates {φki}. For example, the formulation unit 10B evaluates each SAIFI value (φki) calculated in step S32 by comparing it with a preset threshold SAIFI_th. Then, by defining and adding constraint variables such as eki-true (smaller than threshold), false (greater than threshold), i=1, . . . , mk as a result of evaluation, the formulation unit 10B may select only actions that are true based on this variable as selection candidates. Then, the formulation unit 10B selects the facility change action ak in the k-th processing in the policy function unit 202B to which the condition is added. As described above, the formulation unit 10B may set the selection probability by assuming a balance between reliability improvement and cost minimization.
The formulation unit 10B determines whether k is T or more. If the formulation unit 10B determines that k is equal to or greater than T (step S34; YES), the process proceeds to step S36. If the formulation unit 10B determines that k is less than T (step S34; NO), the process proceeds to step S35.
The formulation unit 10B substitutes k+1 for k, and returns to the process of step S32.
The formulation unit 10B outputs the series Φ=(φ1, φ2, . . . , φT) as a plan change proposal.
In the above-described processing, the candidates for each k-th processing may vary depending on the state of the system at the previous time. Therefore, the number mk of facility change proposal candidates may also change depending on the state.
In the above example, SAIFI extracts facility change proposal candidates that meet the criteria as facility change proposals, but it is not limited to this. The formulation unit 10B may, for example, sort in the order of SAIFI values in the above process and extract in order of goodness (or exclude bad candidates), or may extract facility change proposal candidates with the highest values.
As described above, in this embodiment, before the policy function is generated, the SAIFI value for each configuration plan of the correction update proposal candidate is evaluated in advance to impose restrictions on the generation of the policy function. As a result, according to this embodiment, a policy function can be generated for a correction update proposal that satisfies the conditions. As a result, according to this embodiment, it is possible to formulate a facility change proposal efficiently.
In the first embodiment, every k-th process within the planning period is shown as a procedure for continuing facility change planning by setting a constraint on the policy function as an improvement measure.
In this embodiment, since the planning based on the policy function is a stochastic process, if a sufficient number of plan proposals are generated, there is a possibility that there will be a plan that satisfies the SAIFI condition among them, and the proposal should be considered as a draft plan. In that case, there is an advantage that a highly reliable plan can be formulated efficiently without procedures such as correction of the described change proposal and restriction of the policy function. Hereinafter, the process when there is no proposal that satisfies the SAIFI condition among the plurality of plan proposals will be mainly described.
In this embodiment, a sequence of candidates for a facility renewal plan is formulated over a planning period, and a plurality of SAIFI sequences corresponding to that sequence are generated. In this embodiment, if there is a plan that satisfies the SAIFI condition and the cost condition from a plurality of plan series, it is taken as the plan as a result. If the SAIFI condition is not satisfied, the policy function is constrained so as to cause the largest deterioration among the calculated ones, or to constrain the option occurrence probability as a modified update proposal based on a predetermined criterion stored by the formulation unit, to recreate the entire planning series. The sequence of SAIFI is SAIFI (φ0), SAIFI (φ1), . . . , SAIFI (φT)
First, a configuration example of the information-processing device 1C will be described.
The formulation unit 10C includes an evaluation unit 101C and an output unit 102. The evaluation unit 101C includes a change proposal formulation unit 1013, an SAIFI function unit 1014, and a constraint generation unit 1015.
The generation unit 20C includes an environment unit 201, a policy function unit 202C, and a sampling unit 203.
The functional units that operate in the same manner as the information-processing device 1 are denoted by the same reference numerals, and descriptions thereof are omitted.
The evaluation unit 101C calculates a sequence of SAIFI corresponding to a sequence of a plurality of facility renewal plan proposals over the planning period, and outputs the plan if the plan satisfies the conditions including the SAIFI conditions. Otherwise, a choice occurrence probability constraint on the policy function is generated so as to constrain the generation of actions within it that do not satisfy the SAIFI condition. The evaluation unit 101C may use, for example, the constraint variable eki introduced in the functional description of step S33 of the second embodiment as a constraint. However, since the constraint generally depends on the state φk at that time, when the state is φk, the selection candidates are constrained by the condition of eki.
The change proposal formulation unit 1013 saves a plurality of sequences Φ=(φ0, φ1, . . . , φT) of the equipment update plan over the planning period.
The SAIFI function unit 1014 stores SAIFI functions. The SAIFI function unit 1014 obtains the SAIFI of the sequence (of the facility renewal plan created by the change proposal formulation unit 1013.
The constraint generation unit 1015 calculates constraints on the policy function based on the SAIFI sequence corresponding to the sequence (of the facility renewal plan obtained by the SAIFI function unit 1014. The constraint generation unit 1015 extracts a change proposal that does not satisfy the conditions in the SAIFI sequence and generates a constraint of the option occurrence probability for the policy function so as to delete the change proposal from the selection candidates, and the generated constraint information is output to the policy function unit 202C.
The policy function unit 202C stores policy functions. The policy function unit 202C imposes restrictions on the generation of the policy function by restricting the options of the revision change proposal using the restriction information output by the constraint generation unit 1015. The policy function unit 202C inputs the system state output by the environment unit 201 to the policy function and obtains the probability distribution of action selection for facility change correction. The policy function unit 202C outputs the obtained probability distribution of action selection to the sampling unit 203.
Next, the procedure for formulating a facility change proposal will be described.
The generation unit 20C acquires the system state φ0 (initial state) to be evaluated, the policy function, and the environmental conditions.
The formulation unit 10C and the generation unit 20C repeat the processes of steps S42 to S46 to formulate a facility change proposal. The generation unit 20C generates each system state.
The generation unit 20C generates all facility change proposal candidates φk (k=0, . . . , T) using the information acquired in step S41. Multiple series of facility renewal plan proposals are generated.
The formulation unit 10C evaluates the SAIFI series SAIFI (Φ) of the facility change proposal candidate. Accordingly, the formulation unit 10C evaluates whether or not there is a plan series that satisfies the conditions as a plan.
The formulation unit 10C determines whether or not to end the process based on a predetermined criterion stored by itself. As for the predetermined criterion, if there is a plan that satisfies the conditions as a plan as evaluated in step S43, that plan is taken as the resulting plan, and the termination condition is satisfied. Alternatively, the predetermined criterion does not meet the termination condition if all sequences do not meet the condition. If the formulation unit 10C determines to end the process (step S44; YES), the process proceeds to step S46. If the formulation unit 10C determines not to end the process (step S44; NO), the process proceeds to step S45.
The formulation unit 10C restricts the selection of actions that cause SAIFI(Φ) deterioration. The formulation unit 10C restricts action selection that causes SAIFI(Φ) deterioration by restricting the corresponding action with respect to the policy function, that is, making it a non-selection candidate. The formulation unit 10C returns to the process of step S42.
The formulation unit 10C outputs the series (=(φ1, φ2, . . . , φT) as a facility change proposal.
As another example of step S45, when the SAIFI value is degraded, for example, the changed part is strengthened for the facility change proposal candidate whose SAIFI value is greatly degraded. For example, in g102 of
As described above, in the present embodiment, during the planning (inference) of the facility change proposal, the changed part that affects SAIFI from the series of SAIFI(Φ) is specified, and the actions that have a large impact are restricted (measure added constraints to functions).
As a result, according to this embodiment, it is possible to efficiently formulate a facility change proposal by adding restrictions only when there is no plan that satisfies the conditions among the plurality of plans that have been formulated.
Also, in this embodiment, as step S45, a method of correcting each change proposal unit has been described, but a method of correcting each change step as in the first embodiment can also be adopted for this part.
All or part of the functional units of the information-processing device 1 (or 1A, 1B, 1C) described above are realized by executing a program (software) by a hardware processor such as a CPU (Central Processing Unit), for example. Some or all of these components are hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), GPU (Graphics Processing Unit), etc. circuit part; circuit), or by cooperation of software and hardware. The program may be stored in advance in a storage device such as a HDD (Hard Disk Drive) or flash memory, or stored in a removable storage medium such as a DVD or CD-ROM, and it may be installed by attaching the storage medium to the drive device.
Next, an example of an output image displayed on the display device 3 will be described.
Next, an example of a technique for expressing an actual system in a graph structure will be described.
For details of metagraphs, graph neural networks, etc. used in embodiments and examples, see Japanese Patent Application, First Publication No. 2019-204294.
In the configuration as shown in
When expressing the actual system in a graph structure, the graph structure data of code g201 is converted into an assumed node metagraph like code g202 (code g203). A method for converting graph-structured data into an assumed node metagraph will be described later. In symbol g202, AN(Bx), AN(T1) and AN(Ly) indicate real nodes. A graph such as the symbol g202 is called a metagraph.
The metagraph in
A change in facility corresponds to a change in the convolution function corresponding to the facility (local processing). Adding facility corresponds to adding a convolution function. Discarding facility corresponds to deleting the convolution function.
Next, the way to generate a neural network from graph-structured data will be described.
In the generation of the neural network, not only an actual node RN but also an assumed node AN including an actual edge RE are set, and a neural network is generated that propagates the feature amount of the k−1th layer of the assumed node AN to the feature amount of the k-th layer of other assumed nodes AN having a connection relationship and the assumed nodes AN themselves. k is a number equal to or greater than 1, and a layer with k=0 means an input layer, for example. The generation of the neural network may be performed, for example, by an external device or by an information-processing device.
In generating the neural network, for example, the feature value of the first intermediate layer is determined based on the following equation (2). The equation (2) corresponds to a calculation method for the feature amount h1# of the first intermediate layer of the assumed node (RN1).
As an example, α1, 12 are coefficients that indicate the degree of propagation between the assumed node (RN1) and the assumed node (RE12). A feature amount h1## of the second intermediate layer of the assumed node (RN1) is represented by the following equation (3). For the third and subsequent intermediate layers, the feature amounts are determined according to the same rule.
[Equation 2]
h
1#=α1,1·W·h1+α1,12+W·h12+α1,13·W·h13+α1,14·W·h14 (2)
[Equation 3]
h
1##=α1,1·W·h1#+α1,12·W·h12#+α1,13·W·h13#+α1,14·W·h14# (3)
In generating a neural network, for example, coefficients αi,j are determined by rules based on graph attention networks.
In generating the neural network, the parameters (W, αi,j) of the neural network are determined so as to meet the purpose of the neural network while following the above rules. The purpose of the neural network is to output a future state when the assumed node AN is the current state, or to output an index for evaluating the state, or to classify the current state.
Next, an example procedure for formulating a facility change proposal series based on the facility's attention and convolution model will be described.
First, the real system is represented by a graph structure (S101). Next, edge types and function attributes are set from the graph structure (S102). Next, it is represented by a metagraph (S103). Next, network mapping is performed (S104).
Symbol g300 is an example of network mapping. Reference g301 is an edge convolution module. Reference g3022 is a graph attention module. Reference g303 is a time series recognition module. Symbol g304 is a state value function V(s) estimation module. Reference g305 is an action probability p(a|s) calculation module.
Here, the facility change planning problem can be defined as a reinforcement learning problem. In other words, the facility change planning problem is defined as a reinforcement learning problem by taking the graph structure, each node, and edge (facility) parameters as states, adding or deleting facility as actions, and obtaining revenues and costs as rewards. can be done.
Examples of selection for change will be described.
Here, as an initial (t=0) state, consider a four-node graph structure like symbol g401.
From this state, n (n is an integer equal to or greater than 1) choices can be considered as change candidates for the next time t=1, such as symbols g411, g412, . . . , g41n in the middle row.
For each of these options, the option for the next time t=2 is derived. Symbols g421, g422, g423, . . . represent examples of options from the graph structure.
In this way, the selection series is expressed as a metagraph series that reflects changes, that is, as a series of node changes. In the embodiment, reinforcement learning is used as a means of extracting those sequences that match the policy from among such sequences.
In this way, the constructed graph neural network always corresponds to the system configuration on the environment side. In the generation of the neural network, reinforcement learning is advanced using the new state S, the reward value obtained based thereon, the value function estimated on the neural network side, and the policy function as the evaluation results on the environment side.
Next, an example of obtaining a policy function through learning will be described. Here, an example using A3C (Asynchronous Advantage Actor-Critic) as a learning method will be described, but the learning method is not limited to this. In the embodiment, reinforcement learning is used as means for extracting a selection series that matches the reward. Also, the reinforcement learning may be, for example, deep reinforcement learning. The reinforcement learning is performed by a learning device 500 as shown in
The system environment 502 includes a physical model/simulator 5021, a reward calculation unit 5022, and an output unit 5023.
The processing unit 503 includes a generation unit 5031.
The data stored in the external environment DB 501 are external environment data and the like. The environmental data is, for example, specifications of facility nodes, demand data in power systems, information on graph structures, etc., and is parameters that are not affected by environmental conditions and actions and that affect action decisions.
The physical model/simulator 5021 includes, for example, a tidal current simulator, a traffic simulator, a physical model, a function, an equation, an emulator, and a real machine. The physical model simulator 5021 acquires data stored in the external environment DB 501 as necessary, and performs simulation using the acquired data and the physical model. The physical model/simulator 5021 outputs the simulation result (S, A, S′) to the reward calculation unit 5022. S is the system's Last State, A is the extracted action, and S′ is the new state of the system.
The reward calculation unit 5022 calculates the reward value R using the simulation results (S, A, S′) obtained from the physical model/simulator 5021. A method of calculating the reward value R will be described later. Also, the reward value R is {(R1, a1), . . . , (RT, aT)}, for example. Here, T is the facility plan review period. Also, ap (p is an integer from 1 to T) is each node, for example, a1 is the first node and ap is the p-th node.
The output unit 5023 sets the new state S′ of the system as the state S of the system, and outputs the state S of the system and the reward value R to the processing unit 503.
The generation unit 5031 inputs the system state S output by the system environment 502 to the neural network stored in the processing unit 503 to acquire the policy function π(·|S, θ) and the state value function V(S, w). Here, w is a weighting coefficient matrix (also called a convolution term) corresponding to the attribute dimension of the node. The generation unit 5031 determines action (facility change) A in the next step using the following equation (4).
[Equation 4]
A˜π(·|S,θ) (4)
Note that a in equation (3) corresponds to A in equation (4), and φ in equation (3) corresponds to S in equation (4).
The generation unit 5031 outputs the determined next step action (facility change) A to the system environment 502. That is, the policy function π(·|S, θ) is input with the state S of the system under consideration and outputs an action. The generation unit 5031 also outputs the obtained state-value function V(S, w) to the reinforcement learning unit 504. The policy function π(·|S, θ) for selecting an action is given as a probability distribution of action candidates for changing the metagraph structure.
In this way, the generation unit 5031 inputs the state of the system to the neural network, and generates a system of one or more post-change models that cause possible structural changes to the neural network at each time step. A policy function and a state-value function required for reinforcement learning are obtained for each step, and structural changes of the system are evaluated based on the policy function.
The state value function V(S, w) output by the generation unit 5031 and the reward value R output by the system environment 502 are input to the reinforcement learning unit 504. The reinforcement learning unit 504 uses the input state value function V (S, w) and reward value R to repeat reinforcement machine learning by a machine learning method such as A3C for the number of times corresponding to the facility plan review period (T). The reinforcement learning unit 504 outputs parameters <ΔW>π and <Δθ>π obtained as a result of reinforcement machine learning to generation unit 5031.
The generation unit 5031 updates the parameters of the convolution function based on the parameters output by the reinforcement learning unit 504. The generation unit 5031 reflects the updated parameters <ΔW>π and <Δθ>π in the neural network, and evaluates the neural network reflecting the parameters.
Next, the functions and operations of the generation unit 5031 will be further described.
The generation unit 5031 acquires a “status signal” from the system environment 502, and as a part of it, a change information signal reflecting facility changes. The generation unit 5031 defines a metagraph structure corresponding to a corresponding new system configuration when a change information signal is acquired, and generates a corresponding neural network structure. At this time, the generation unit 5031 formulates a neural network structure that efficiently processes evaluation value estimation calculations for the value functions and policy functions required for the change proposal. Further, the generation unit 5031 refers to the convolution function stored therein corresponding to the changed portion, and constructs a metagraph corresponding to the actual system configuration from the set of convolution functions. Then, the generation unit 5031 changes the metagraph structure corresponding to the facility change (corresponding to the behavior, updates the graph structure, sets “candidate nodes”, etc.). The generation unit 5031 associates attributes with nodes and edges and defines and manages them.
The generation unit 5031 has a convolution function definition function corresponding to the facility type and a convolution function parameter update function. The generation unit 5031 manages the partial metagraph structure and the corresponding convolution module or attention module. The generation unit 5031 defines a convolution function for a model representing graph-structured data based on the graph-structured data representing the structure of the system. The partial metagraph structure is a library function of individual convolution functions corresponding to each facility type node or edge. The generation unit 5031 updates parameters of individual convolution functions in the learning process.
The generation unit 5031 acquires the convolution module or attention module corresponding to the formulated neural network structure and the partial metagraph structure to be managed. The generation unit 5031 has a function of converting a metagraph into a multi-layer neural network, a function of defining the output function of the neural network of functions required for reinforcement learning, and a function of updating the parameter set of the convolution function or the neural network. Functions necessary for reinforcement learning are, for example, a reward function, a policy function, and the like. Also, the output function definition is, for example, a fully connected multi-layer neural network or the like that receives the output of the convolution function. Note that full connection is a mode in which each input is connected to all other inputs.
Next, an example of the reward function will be described.
The reward function is, for example, “(bias)—(facility installation, disposal, operation, maintenance costs)”. The reward function may be defined as a positive reward value by modeling (functioning) the cost for each facility and subtracting it from the bias. The bias is a parameter that is appropriately set as a constant positive value so that the reward function value becomes a positive value.
Although several embodiments of the invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These embodiments can be implemented in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and their modifications are included in the scope and spirit of the invention, as well as the scope of the invention described in the claims and equivalents thereof.
1, 1A, 1B, 1C . . . Information-processing device, 10, 10A, 10B, 10C . . . Formulation unit, 20, 20A, 20B, 20C . . . Generation unit, 101, 101A, 101B, 101C . . . Evaluation unit, 102 . . . Output unit, 201, 201B . . . Environment unit, 202, 202A, 202B . . . Policy function unit, 203 . . . Sampling unit, 204 . . . Candidate proposal list unit, 1011, 1014 . . . SAIFI function unit, 1012 . . . List unit, 1013 . . . Change proposal formulation unit, 1015 . . . Constraint generation unit
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2021/011108 | Mar 2021 | US |
Child | 18467460 | US |