Embodiments of the present invention relate to an information processing device, an information processing method, and a program.
In recent years, aging of social infrastructure systems has been one of the major issues. For example, in electric power systems, lots of transformer substation facilities have been aging worldwide and it is important to formulate capital investment plans. Experts have been developing solutions to the problems associated with such capital investment plans in each field. With regard to planning for social infrastructure systems, it is necessary to satisfy the requirements of large scale, diversity, and variability in some cases. However, the related art is not responsible or adaptable to changes in configurations of the social infrastructure systems.
Some embodiments of the present invention provide an information processing device, an information processing method, for creating proposal for changes in structure of social infrastructures.
In some embodiments, an information processing device may include, but is not limited to, a definer, a determiner, and a reinforcement learner. The definer is configured to associate a node and an edge with attributes and to define a convolution function associated with a model representing data of a graph structure representing a system structure on the basis of data regarding the graph structure. The evaluator is configured to input a state of the system into the model. The evaluator is configured to obtain, for each time step, a policy function as a probability distribution of a structural change and a state value function for reinforcement learning for a system of one or more structurally changed models which have been changed with assumable structural changes from the model for each time step. The evaluator is configured to evaluate the structural changes in the system on the basis of the policy function. The reinforcement learner is configured to perform reinforcement learning by using a reward value as a cost generated when the structural change is applied to the system, the state value function, and the model, to optimize the structural change in the system.
An information processing device, an information processing method, and a program in an embodiment will be described below with reference to the drawings. In the following description, a facility change plan will be described below as an example of processing handled by the information processing device. This embodiment is not limited to a facility change plan task for a social infrastructure system.
First, an example of an electric power circuit system will be described.
It is assumed that a facility change mentioned herein includes selecting one of three selection options, i.e., “addition,” “disposal,” and “maintenance” for the transformer T_0 between the bus B4 and the bus B7, the transformer T_1 between the bus B4 and the bus B9, the transformer T_2 between the bus B5 and the bus B6, the transformer T_3 between the bus B7 and the bus B8, the transformer T_4 between the bus B7 and the bus B9, the transformer T_5 between the bus B4 and the bus B7, the transformer T_6 between the bus B4 and the bus B9, the transformer T_7 between the bus B5 and the bus B6, and the transformer T8 between the bus B7 and the bus B9. The three selection options are present for each of the transformers. Thus, when n (n is an integer of greater than or equal to 1) transformers are present, 3n combinations are provided. When such a facility change is considered, it is necessary to take risk costs due to an operation cost (a maintenance cost), an installation cost, a system being down, or the like of a transformer facility into account.
In the embodiment, an actual system is first expressed using a graph structure for the purpose of the facility change.
Assuming that a bus is an actual node, a transformer is an actual edge of a type “T,” and an electric power line is an actual edge of a type “L” in the configuration illustrated in
In the embodiment, the data regarding the graph structure of reference symbol g1 is converted into an assumption node meta-graph such as reference symbol g2 (reference symbol g3). A method of performing the converting from the data regarding the graph structure into the assumption node meta-graph will be described later. In reference symbol g2, AN(Bx), AN(T), and AN(Ly) indicate actual nodes. In the following description, a graph such as reference symbol g2 is referred to as a “meta-graph.”
An example in which a facility T1* is added between nodes AN(B1) and AN(B2) in the configuration illustrated in
If the meta-graph illustrated in
In this way, a change in facility corresponds to a change in convolution function corresponding to the facility (local processing). Addition of a facility corresponds to addition of a convolution function. Disposal of a facility corresponds to deletion of a convolution function.
An example of a configuration of a neural network generator 100 will be described below.
For example, the data acquirer 101 acquires data regarding a graph structure from an external device and stores the data in the storage 102. The data acquirer 101 may acquire (read) data regarding a graph structure stored in the storage 102 in advance instead of acquiring the data regarding the graph structure from the external device or may acquire data regarding a graph structure input by a user using an input device.
The storage 102 is implemented through, for example, a random access memory (RAM), a hard disk drive (HDD), a flash memory, or the like. The data regarding the graph structure stored in the storage 102 is, for example, data in which a graph structure is expressed as each record of the actual node RN and the actual edge RE. Furthermore, the data regarding the graph structure may include a feature amount as an initial state of each actual node RN. The feature amount as the initial state of the actual node RN may be prepared as a data set different from the data regarding the graph structure.
The network processor 103 includes, for example, an actual node/actual edge neighborhood relationship extractor 1031, an assumption node meta-grapher 1032, and a meta-graph convolution unit 1033.
The actual node/actual edge neighborhood relationship extractor 1031 extracts the actual node RN and the actual edge RE in a neighborhood relationship (a connection relationship) with reference to the data regarding the graph structure. For example, the actual node/actual edge neighborhood relationship extractor 1031 may comprehensively extract the actual node RN or the actual edge RE in a neighborhood relationship (a connection relationship) for each of the actual node RN and the actual edge RE and store the extracted actual node RN or actual edge RE in the storage 102 in a form in which they are associated with each other.
The assumption node meta-grapher 1032 generates a neural network in which states of the assumption node AN are connected in a layer shape so that the actual node RN and the actual edge RE extracted through the actual node/actual edge neighborhood relationship extractor 1031 are connected. At this time, the assumption node meta-grapher 1032 determines a propagation matrix W and a coefficient αi,j to satisfy the purpose of the neural network described above while following a rule based on a graph attention network described above.
For example, the meta-graph convolution unit 1033 inputs a feature amount as an initial value of the actual node RN of the assumption node AN to the neural network and derives a state (an amount of feature) of an assumption node AN of each layer. When this processing is repeatedly performed, the output unit 104 outputs the amount of feature of the assumption node AN to the outside.
An assumption node feature amount storage 1034 stores the amount of feature as the initial value of the actual node RN. The assumption node feature amount storage 1034 stores the amount of feature derived through the meta-graph convolution unit 1033.
A method of generating a neural network from data regarding a graph structure will be described below.
As illustrated in the drawings, the neural network generator 100 sets not only the actual node RN but also the assumption node AN including the actual edge RE and generates a neural network in which an amount of feature of a k−1th layer of the assumption node AN is caused to propagate to an amount of feature of a kth layer of another assumption node AN in a connection relationship to the assumption node AN and the assumption node AN itself. k is a natural number of greater than or equal to 1 and a layer in which k=0 is satisfied refers to, for example, an input layer.
The neural network generator 100 determines, for example, an amount of feature of the first intermediate layer on the basis of the following Expression (1). Equation (1) corresponds to a method of calculating an amount of feature h1# of a first intermediate layer of an assumption node (RN1).
For example, α1,12 is a coefficient indicating a degree of propagation between the assumption node (RN1) and an assumption node (RE12). An amount of feature h1## of a second intermediate layer of the assumption node (RN1) is represented by the following Expression (2). Also after a third intermediate layer, amounts of feature are sequentially determined in accordance with the same rule.
[Expression 1]
h
1#=α1,1·W·h1+α1,12·W·h12+α1,13·W·h13+α1,14·W·h14 (1)
[Expression 2]
h
1##=α1,1·W·h1#+α1,12·W·h12#+α1,13·W·h13#+α1,14·W·h14# (2)
For example, the neural network generator 100 determines a coefficient αi,j in accordance with a rule based on a graph attention network.
The neural network generator 100 determines a parameter (W, αi,j) of a neural network to satisfy the purpose of the neural network while following the rule described above. The purpose of the neural network is to output a state in the future when an assumption node AN is set to a state in the present, to output an index used for evaluating a state, or to classify a state in the present.
An example of a configuration of an information processing device 1 will be described below.
The environment 2 is, for example, a simulator, a server device, a database, a personal computer, or the like. The environment 2 receives, as an input, a change proposal as an action from the information processing device 1. The environment calculates a state in which the change is incorporated, calculates a reward, and returns the calculated results to the information processing device 1.
The display device 3 is, for example, a liquid crystal display device. The display device 3 displays an image output by the information processing device 1.
The information processing device 1 includes the functions of the neural network generator 100 described above and performs construction of a graph neural network and updating using machine learning. For example, the management function unit 11 may include the functions of the neural network generator 100. The graph neural network may be generated in advance. The information processing device 1 changes a neural network based on a change proposal acquired from the environment 2, estimates a value function (Value) value, and performs reinforcement learning processing such as temporal difference (TD) calculation based on a reward fed back from the environment. The information processing device 1 updates coefficient parameters such as a convolution function on the basis of the results of reinforcement learning. The convolution network may be a multi-layer neural network constituted by connecting convolution functions corresponding to each facility. Furthermore, each convolution function may include attention processing if necessary. A model is not limited to a neural network and may be, for example, a support vector machine or the like.
The meta-graph structure series management function unit 111 acquires a “state signal” from the environment 2; a change information signal obtained by reflecting the facility change in a part thereof. The meta-graph structure series management function unit 111 defines a meta-graph structure corresponding to a new corresponding system configuration when acquiring the change information signal and formulates a corresponding neural network structure. At this time, the meta-graph structure series management function unit 111 formulates a neural network structure in which evaluation value estimation calculation of a value function and policy function that require a change proposal is performed with high efficiency. Furthermore, the meta-graph structure series management function unit 111 constitutes a meta-graph corresponding to an actual system configuration from a convolution function set with reference to a convolution function corresponding to a change location from the convolution function management function unit 112. Moreover, the meta-graph structure series management function unit 111 performs a change of a meta-graph structure corresponding to the facility change (updating of a graph structure, setting of a “candidate node,” or the like in response to an action). The meta-graph structure series management function unit 11 performs defining and managing by associating a node and an edge with an attribute. Furthermore, the meta-graph structure series management function unit 111 includes some of the functions of the neural network generator 100 described above. In addition, the meta-graph structure series management function unit 111 is an example of the “definer.”
The convolution function management function unit 112 includes a function of defining a convolution function corresponding to a type of facility, and a function of updating a parameter of the convolution function. The convolution function management function unit 112 manages a convolution module corresponding to a partial meta-graph structure or an attention module. The convolution function management function unit 112 defines a convolution function associated with a model representing data regarding a graph structure representing a system structure on the basis of the data regarding the graph structure. The partial meta-graph structure has a library function of an individual convolution function corresponding to each facility type node or edge. The convolution function management function unit 112 updates parameters of each convolution function in a learning process. Furthermore, the convolution function management function unit 112 includes some of the functions of the neural network generator 100 described above. In addition, the convolution function management function unit 112 is an example of the “definer.”
The neural network management function unit 113 acquires a convolution module or an attention module corresponding to a neural network structure formulated by the meta-graph structure series management function unit 111 and a partial meta-graph structure managed by the convolution function management function unit 112. The neural network management function unit 113 includes a function of converting a meta-graph into a multi-layer neural network, a function of defining an output function of a neural network of a function required for reinforcement learning, and a function of updating the above-described convolution function or neural network parameter set. Functions required for reinforcement learning are, for example, reward functions, policy functions, and the like. Furthermore, an output function definition has, for example, a full-connect/multi-layer neural network and the like in which an output of the convolution function is utilized as an input. Full connect is a form in which each input is connected to all other inputs. In addition, the neural network management function unit 113 includes some of the functions of the neural network generator 100 described above. Moreover, the neural network management function unit 113 is an example of the “evaluator.”
The graph convolution neural network 12 stores, for example, an attention-type graph convolution network composed of various types of convolutions as a deep neural network.
The reinforcement learner 13 performs reinforcement learning using the graph convolution neural network constructed by the graph convolution neural network 12 and a state and a reward output by the environment. The reinforcement learner 13 changes the parameters on the basis of the results of the reinforcement learning and outputs the changed parameters to the convolution function management function unit 112. A reinforcement learning method will be described later.
The manipulator 14 includes a keyboard, a mouse, a touch panel sensor provided on the display device 3, and the like. The manipulator 14 detects the user's operation and outputs the detected operation result to the image processor 15.
The image processor 15 generates an image associated with an evaluation environment and an image associated with the evaluation result in accordance with the operation result and outputs the generated images to the presenter 16. The image associated with the evaluation environment and the image associated with the evaluation result will be described later.
The presenter 16 outputs the image output by the image processor 15 to the environment 2 and the display device 3.
The formulation of a facility change plan series will be described below on the basis of a facility attention and convolution model.
First, an actual system is represented by a graph structure (S1). Subsequently, a type of edge and a function attribute are set from the graph structure (S2). Subsequently, the representation is performed by a meta-graph (S3). Subsequently, network mapping is performed (S4).
Reference symbol g20 is an example of the network mapping. Reference symbol g21 indicates an edge convolution module. Reference symbol g22 indicates a graph attention module. Reference symbol g23 indicates a time series recognition module. Reference symbol g24 indicates a state value function V(s) estimation module. Reference symbol g25 indicates an action probability p(a|s) calculation module.
Here, the facility change plan task can be defined as a problem regarding reinforcement learning. That is to say, the facility change plan task can be defined as a reinforcement learning problem using the graph structure and the parameters of each node and edge (facility) as states, the addition or the deletion of a facility as an action, and the profits and the expenses to be obtained as rewards.
An example of selection management of changes performed by the meta-graph structure series management function unit 111 will be described.
Here, as an initial (t=0) state, a graph structure with 4 nodes such as Reference symbol g31 is considered.
From this state, as change candidates for the next time t=1, n (n is an integer of greater than or equal to 1) selection options such as Reference symbols g41, g42, . . . , and g4n in the middle row are considered.
For each of these selection options, a selection option at the next time t=2 is derived. Reference symbols g51, g52, . . . represent examples of selection options from a graph structure of Reference symbol g43.
In this way, a selection series is represented as a series of meta-graphs obtained by reflecting the changes, that is, a series of node changes. In the embodiment, reinforcement learning is utilized as a means for extracting a meta-graph in which a policy is satisfied from such a series.
In the embodiment, in this way, a graph neural network constituted using the information processing device 1 is associated with a system configuration on the environment side all the time. Furthermore, the information processing device 1 performs reinforcement learning using a new state S, a reward value obtained on the basis of the new state S, a value function estimated on the neural network side, and a policy function as the evaluation results on the environment side.
An example of a learning method performed by an information processing device 1 will be described. Here, although an example in which an asynchronous advantage actor-critic (A3C) is utilized as the learning method will be described, the learning method is not limited thereto. In the embodiment, reinforcement learning is utilized as a means for extracting a meta-graph in which a reward is satisfied from the selection series. Furthermore, the reinforcement learning may be, for example, deep reinforcement learning.
Data stored in the external environment DB 21 corresponds to external environment data and the like. The environment data includes, for example, specifications of facility nodes, demand data in an electric power system or the like, and information and the like associated with a graph structure and corresponds to parameters which are not affected by environment states and acts and influences the determination of an action.
The physical model simulator 221 includes, for example, a tidal simulator, a traffic simulator, a physical model, a function, an equation, an emulator, an actual machine, and the like. The physical model simulator 221 acquires data stored in the external environment DB 21 if necessary and performs a simulation using the acquired data and the physical model. The physical model simulator 221 outputs the simulation results (S, A, and S′) to the reward calculator 222. S indicates a state of the system (Last State), A indicates the extracted act, and S′ indicates a new state of the system.
The reward calculator 222 calculates a reward value R using the simulation results (S, A, and S′) acquired from the physical model simulator 221. A method for calculating the reward value R will be described later. Furthermore, the reward value R is, for example, {(R1,a1), . . . , (RT,aT)}. Here, T indicates a facility plan examination period. Furthermore, αp (p is an integer from 1 to T) indicates each node. For example, a1 indicates a first node and αp indicates a pth node.
The output unit 223 sets a new state S′ of the system as a state S of the system and outputs the state S of the system and the reward value R to the information processing device 1.
A neural network management function unit 113 of a management function unit 11 inputs the state S of the system output by the environment 2 to a neural network stored in a graph convolution neural network 12 and obtains a policy function π(·|S,θ) and a state value function V(S, w). Here, w indicates a weight coefficient matrix (also referred to as a “convolution term”) corresponding to an attribute dimension of a node. The neural network management function unit 113 determines an act (a facility change) A in the next step using the following Expression (3).
[Expression 3]
A˜π(·|S,θ) (3)
The neural network management function unit 113 outputs the act (the facility change) A in the determined next step to the environment 2. That is to say, the policy function π(·|S,θ) receives, as an input, the state S of the system which is an examination target and outputs an act (an action). Furthermore, the neural network management function unit 113 outputs the obtained state value function V (S,w) to the reinforcement learner 13. The policy function π(·|S,θ) of selecting an action is provided as a probability distribution of an action candidate for a meta-graph structure change.
In this way, the neural network management function unit 113 inputs a state of the system to the neural network, obtains, for each time step, a policy function and a state value function required for reinforcement learning in a system of a model in which one or more changes in which a structural change which can be assumed for each time step is performed has been performed on a neural network, and evaluates a structural change of the system on the basis of the policy function. The neural network management function unit 113 may evaluate a structural change plan or a facility change plan of the system.
A state value function V(S,w) output by the management function unit 11 and a reward value R output by the environment 2 are input to the reinforcement learner 13. The reinforcement learner 13 repeatedly performs, for example, reinforcement machine learning using a machine learning method such as A3C the number of times a series of behaviors (actions) corresponds to a facility plan examination period (T) using the input state value function V(S,w) and the reward value R. The reinforcement learner 13 outputs parameters <ΔW>π and <Δθ>π obtained as a result of the reinforcement machine learning to the management function unit 11.
The convolution function management function unit 112 updates the parameters of the convolution function on the basis of the parameters output by the reinforcement learner 13.
The neural network management function unit 113 reflects the updated parameters <ΔW>π and <Δθ>π in the neural network and evaluates the neural network having the parameters reflected therein.
In the selection of the next behavior, the management function unit 11 may or may not utilize the above-described candidate node (refer to
An example of the reward function will be described below.
A first example of the reward function is (bias)-(facility installation, disposal, operation, maintenance costs). In the first example of the reward function, a respective cost may be modeled (a function) for each facility and defined as a positive reward value by subtracting the cost from the bias. The bias is a parameter which is appropriately set as a constant positive value so that a reward function value is a positive value.
A second example of the reward function is (bias)-(risk cost). In some cases physical system conditions may not be satisfied in accordance with a facility configuration. When the conditions are not satisfied, for example, a connection condition is not established, a flow is unbalanced, and an output condition is not satisfied. When such large risks occur, a negative large reward (risk) may be imposed.
A third example of the reward function may be a combination of the first and second examples of the reward function.
In this way, in this embodiment, it is possible to design various reward functions such as the first to third examples.
In this embodiment, an example in which the next behavior is selected using a candidate node will be described.
A meta-graph structure series management function unit 111 may utilize a candidate node processing function. In this embodiment, a method in which a function in which facility node addition is likely to occur is connected to a meta-graph as a candidate as the next behavior (action) candidate and value estimation is performed on a plurality of behavior candidates in parallel will be described. A configuration of an information processing device 1 is the same as in the first embodiment.
A feature of an attention type neural network is that, even if a node is added, it is possible to perform efficient analyze and evaluation of additional effects without performing learning again by adding a learned convolution function corresponding to the node to a neural network. This is because constituent elements of a graph structure neural network based on a graph attention network are expressed as convolution functions and the whole is expressed as graph connection of a function group thereof. That is to say, when a candidate node is utilized, a classification can be performed as a neural network which expresses the entire system and a convolution function which constitutes the added node and a management can be performed.
The management function unit 11 is connected to a meta-graph as a candidate using a unidirectional connection as illustrated in Reference symbol g111 of
The management function unit 11 is connected through a unidirectional connection from the nodes B1 and B2 to T1 such as Reference symbol g112 and performs value calculation (a policy function and a state value function) associated with T1 and T1* nodes in parallel to evaluate a value when a node T1* is added. Furthermore, Reference symbol g1121 is a reward difference for T1 and Reference symbol g1122 is a reward difference for T1* addition. It is possible to perform the estimation of a reward value of a two-dimensional behavior of reference symbol g112 in parallel.
Thus, in this embodiment, as a combination of nodes (T1,T1*), four combination, i.e., {(presence, presence), (presence, absence), (absence, presence), (absence, absence)} can be evaluate at the same time. As a result, according to this embodiment, since the evaluation can be performed in parallel, it is possible to perform calculation at high speed.
In
Here, in a case of S4 (absence, absence) among the four combinations, B1 and B2 are systematically disconnected and cannot be established as a system. In this case, the management function unit 11 causes a large risk cost (penalty) to incur. Furthermore, in this case, the management function unit 11 performs reinforcement learning in parallel for each of the states S1 to S4 on the basis of a value function value and a policy function from the neural network.
In this embodiment, an example in which parallel processing of a process of sampling a plan series proposal is performed will be described. A configuration of the information processing device 1 is the same as in the first embodiment.
The information processing device 1 samples a plan proposal using a convolution function for each acquired facility. Furthermore, the information processing device 1 outputs plan proposals, for example, in the order of cumulative scores. The order of cumulative scores is, for example, the order of lower costs and the like.
The external environment DB 21 stores, for example, demand data in an electric power system, data relating to facility specifications, an external environment data set different from learning data such as a graph structure of a system, and the like.
The policy function is constituted using a graph neural network constituted using a learned convolution function (a learned parameter: On).
An action (a facility node change) in the next step is determined using the following Expression (4) using a state S of the system as an input.
[Expression 4]
A˜π(·|S,θπ) (4)
The management function unit 11 extracts a policy using Expression (4) on the basis of a policy function (a probability distribution for each behavior) according to a state. The management function unit 11 inputs the extracted action A to a system environment and calculates a new state S′ and a new value R associated therewith. The new state S′ is used as an input used for determining the next step. Rewards are accumulated over an examination period. The management function unit 11 repeatedly performs this operation for the number of steps corresponding to the examination period and obtains each cumulative reward score (G).
A series of change plan series throughout an examination period corresponds to one facility change plan. A cumulative reward score corresponding to this plan is obtained. A set of combinations of a plan proposal obtained in this way and a score thereof is a plan proposal candidate set.
First, the management function unit 11 samples a plan (action series {at}t) from a policy function acquired through learning for each episode and obtains a score.
Subsequently, the management function unit 11 performs selection, for example, using an argmax function and extracts a plan {A1, . . . , AT} corresponding to the largest test among a G value of each trial (test) result. The management function unit 11 can also extract a higher-level plan.
According to this embodiment, processes of sampling each plan series proposal (N times in
In order to process policy functions in parallel, standardization at an output layer is required. For the purpose of the standardization, for example, the following Expression (5) is used.
In Expression (5), a preference function is a product π(st,a,θ) of a coefficient θ and a vector x for a target output node.
Here, a case in which a multidimensional behavior (action) is handled will be described.
If an action space is a two-dimensional space, a=(a1,a2) is set, a is considered as a direct product of the two spaces, and a can be expressed as the following Expression (6). a1 is a first node and a2 is a second node.
[Expression 6]
h(st,a,θ)=h(st,a1,θ)+h(st,a2,θ) (6)
That is to say, preference function may perform calculation and addition for individual spaces. In this way, individual preference functions can perform calculation in parallel if a state St of the underlying system is the same.
A facility node change policy model g201 corresponds to a learned policy function and shows an action selection probability distribution for each step in which learning has been performed in the above process.
A task setting function g202 corresponds to a task definition and a setting function such as an initial system configuration, initialization of each node parameter, external environment data, test data, and a cost model.
A task formulation function g203 includes a task defined through the task setting function, a function examination period (an episode) in which a learned policy function used as an update policy model is associated with the formulation of reinforcement learning, a policy (minimizing or leveling of a cumulative cost), an action space, an environment state space, evaluation score function formulation (a definition), and the like.
A change series sample extraction/cumulative score evaluation function g204 generates a required number of action series from a learned policy function in the defined environment and an agent environment and utilizes the action series as samples.
An optimum cumulative score plane/display function g205 selects a sample with an optimum score from a sample set or presents the samples in the order of the scores.
A function setting UT g206 is a user interface in which setting of each function unit is performed.
A specific calculation example of a facility change plan proposal will be described below.
Here, an example in which the method of the embodiment is applied to the follow tasks will be described. As the evaluation electric power circuit system model, IEEE Case 14 (Electrical Engineering, U. of Washington) shown in
A task is to search for a plan proposal having the lowest cumulative cost in a facility update series with a series of 30 steps. In an initial state, as illustrated in
The cost to be considered is an installation cost for each facility node of the transformer and a cost according to the passage of time and a load power value and a large penalty value is imposed as a cost if the conditions for establishing the environment become difficult due to the facility change. The conditions for establishing the environment are, for example, a power flow balance and the like.
The points of the task are as follows.
I. Series system model; IEEE Case 14
II. Task; a facility change plan of new installation and deletion of a transformer of IEEE Case 14 is established so that the minimum cost is obtained over a planning period (30 updating opportunities).
III-1. An initial state: a transformer (V_x) with the same specifications is installed between buses.
III-2: An operation cost of each transformer facility is the (weighted) sum of the following three types of costs (an installation cost, a maintenance cost, and a risk cost).
Although the progress of the learning process can be grasped from the learning curve, the actual facility change plan proposal needs to be generated by the policy function acquired in this learning process. For this reason, 1000 plan proposals and a cumulative reward value for each plan proposal are calculated and selection criteria such as a plan proposal in which a minimum value of a cumulative reward value is realized or a plan proposal in which the top three value is extracted among minimum value cumulative reward values can be set as a selection policy from the series.
The information processing device 1 generates a plan change proposal for an examination period on the basis of the policy function and manages cumulative reward values in association with each other (for example, Plank: {At to π(·|St)}t→Gk) when a plan proposal is created on the basis of a policy.
An image of reference symbol g401 is an example of an image in which an evaluation target system is represented using a meta-graph. An image of Reference symbol g402 is an image of a circuit diagram of a corresponding actual system. An image of Reference symbol g403 is an example of an image in which an evaluation target system is represented using a neural network structure. An image of Reference symbol g404 is an example of an image in which top three plans having the lowest cost among cumulative costs are represented. An image of Reference symbol g405 is an example of an image in which a specific facility change plan having the highest cumulative minimum cost is represented (for example,
In this way, in the embodiment, a plan in which the conditions are satisfied and a satisfactory score is provided (a plan with a low cost) is extracted from a sample plan set. A plurality of high-ranking plans may be selected and displayed as the number of plans to be extracted as illustrated in
In this way, the information processing device 1 causes the display device 3 (
The user may adopt an optimum plan proposal according to the environment and the situation by checking the displayed image, graph, or the like of the plan proposal and cost.
Extraction filters of leveling, a parameter change, and the like will be described below. The information processing device 1 may utilize the extraction filters of leveling, a parameter change, and the like in the optimum plan extraction.
In a first extraction example, a plan proposal in which a setting level of leveling is satisfied is prepared from a set M. In a second extraction example, a plan proposal is created by changing a coefficient of a cost function. In the second extraction example, for example, coefficient dependence is evaluated. In a third extraction example, a plan proposal is created by changing an initial state of each facility. In the third extraction example, for example, initial state dependence (an aging history at the beginning of the examination period and the like) is evaluated.
According to at least one embodiment described above, when the convolution function management function unit, the meta-graph structure series management function unit, the neural network management function unit, and the reinforcement learner are provided, it is possible to create a social infrastructure change proposal.
Also, according to at least one embodiment described above, it is possible to perform higher speed processing by evaluating a combination of the connected node and candidate node through parallel processing using the neural network obtained by connecting the candidate node to the system.
Furthermore, according to at least one embodiment described above, since the plan proposal with a satisfactory score is presented on the display device 3, it is easier for user to examine a plan proposal.
The function units of the neural network generator 100 and the information processing device 1 are realized when a hardware processor such as a central processing unit (CPU) executes a program (software). Some or all of these constituent elements may be implemented through hardware (including a circuit unit; a circuitry) such as a large scale integration (LSI), an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA), a graphics processing unit (GPU) or may be implemented through cooperation of software and hardware. The program may be stored in advance in a storage device such as a hard disk drive (HDD) and a flash memory, stored in an attachable/detachable storage medium such as a DVD and a CD-ROM, or installed when a storage medium is installed in a drive device.
Although some embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the present invention. These embodiments can be implemented in various other forms and various omissions, replacements, and changes are possible without departing from the gist of the present invention. These embodiments and modifications thereof are included in the scope and the gist of the present invention and the invention described in the claims and the equivalent scope thereof.
Number | Date | Country | Kind |
---|---|---|---|
2019-196584 | Oct 2019 | JP | national |