The present application claims priority from Japanese Patent Application JP 2008-168052 filed on Jun. 27, 2008, the content of which is hereby incorporated by reference into this application.
The present invention relates to a distributed system for exercising highly reliable control based on cooperative operations of multiple networked devices.
In order to improve driving comfort and safety of vehicles, vehicle control systems are being developed that reflect, using electronic control rather than mechanical coupling, driver's operations on an accelerator, a steering wheel, a brake, and the like in vehicle mechanisms for generating driving, steering, and braking forces. Similar electronic control is presently applied to other devices such as construction equipment. In these systems, multiple electronic control units (ECUs) are distributed on the devices and provide cooperative operations by exchanging data via a network. When a fault occurs in an ECU in a network, it is required, for a fail-safe, that each of the other ECUs within the same network accurately locates the fault and provides backup control appropriate to the content of the fault. Japanese Published Unexamined Patent Application No. 2000-47894 discloses a technology allowing each of nodes (e.g., ECUs as processors) included in the system to monitor the other networked nodes.
The technology described in Japanese Published Unexamined Patent Application No. 2000-47894 requires a special node (shared disk) to share monitoring information about operating states of database applications and the like with respective nodes. A fault of the shared disk inhibits continuous monitoring of nodes in the system. Installation of the shared disk may increase system costs.
The following method may solve the problem. Each of the nodes independently monitors a specific item of a given node so as to detect a fault. The nodes exchange a fault-monitoring result through the network. A fault is finally located from the fault-monitoring results collected in the nodes by a majority rule, for example. These processes are synchronized with a communication cycle. The above-mentioned processes of monitoring a fault, exchanging the fault-monitoring results, and locating the fault are performed in a pipelined fashion, making it possible to locate the fault at every communication cycle.
However, locating a fault at every communication cycle may be too frequent for a system. An object of the present invention is to provide a distributed system that can configure cycles for fault monitoring and communication independently to reduce processing loads on a central processing unit (CPU) and communication bands for fault monitoring and increase the degree of freedom for configuring fault-monitoring cycles.
To achieve the above-mentioned object, an aspect of the present invention provides a distributed system that includes plural nodes being connected to each other via a network. Each of the nodes includes a fault-monitoring section for monitoring a fault in other nodes; a transmission and reception section for transmitting and receiving data to detect the fault in other nodes via the network; and a fault-identification section for identifying which node has the fault based on the data. The fault-monitoring section uses plural communication cycles as a monitoring period; the plural communication cycles are synchronized between the nodes.
The distributed system of the present invention may include the transmission and reception section that includes a monitoring result from the fault-monitoring section in transmission and reception data and distributes transmission and reception of the data into a next monitoring period. The next monitoring period is a next period of when the monitoring result is obtained.
The distributed system of the present invention may include the fault-identification section that distributes fault identification into next monitoring period. The next monitoring period is a next period of when the monitoring result is obtained by the fault-monitoring section. The monitoring result is included in the data.
The distributed system of the present invention may include the fault-monitoring section that changes, while the distributed system is in operation, the monitoring period for each node to be monitored.
The present invention can provide a distributed system that reduces processing loads on a central processing unit (CPU) and communication bands for fault monitoring and increases the degree of freedom for configuring fault-monitoring cycles.
Embodiments of the present invention will be described in detail with reference to the drawings.
The distributed system includes multiple nodes 10, such as 10-1, 10-2, . . . , and 10-n. The nodes are connected via a network 100. The nodes are equivalent to a processor capable of information communication via the network, including various electronic control devices such as the CPU, actuators, and necessary drivers, and sensors. The network 100 is capable of multiplex communication and broadcast transmission that simultaneously transmits the same content from a given node to all the other nodes connected to the network. The distributed system may use a communication protocol such as FlexRay (registered trademark of Daimler AG) and TTCAN (time-triggered CAN).
Each node is represented by i, where i is the node number ranging from 1 to n. Each node includes a CPU 11-i, a main memory 12-i, an interface (I/F) 13-i, and a storage device 14-i. The nodes are connected to each other through an internal communication line or the like. The interface 13-i is connected to the network 100.
The storage device 14-i includes programs, such as a fault-monitoring section 141-i, a transmission/reception section 142-i, a fault-identification section 143-i, and a counter section 144-i, and a fault-identification result 145-i. The fault-identification result 145-i includes a monitoring result table, a fault-identification result table, and an error counter, which are described later.
The CPU 11-i reads these programs into the main memory 12-i and executes the programs for processes. The programs or data described in this specification may be stored in the storage device in advance, may be supplied from storage media such as CD-ROM, or may be downloaded from other devices via the network. Special hardware may be used to realize functions implemented by the programs.
In the following explanation, the programs are described as an agent of the processing but the actual agent is the CPU, which performs processing according to the programs.
The fault-monitoring section 141-i performs fault monitoring (MON) on the other nodes. The transmission/reception section 142-i transmits or receives data via the network 100 for detecting faults on the other nodes. The fault-identification section 143-i performs fault identification (ID) to identify which node has a fault based on data for detecting faults on the other nodes. The counter section 144-i counts the number of errors in a node identified as having a fault, with respect to nodes, error locations (error items), and fault-identification conditions to be described later.
At Step 21, the fault-monitoring section 141-i monitors a fault on the other nodes, performing a fault-monitoring process (MON) that determines, by node i itself, whether a fault occurs or not in a transmission node according to the contents of received data or the situation of reception. It may be preferable to use multiple fault-monitoring items. For example, an item “reception malfunction” indicates a malfunction when the data reception has an error, such as detection of unsuccessful reception or a malfunction in received data based on an error-detecting code. An item “sequence number malfunction” is used as follows. The transmission node supplies transmission/reception data with a sequence number that an application increments at every communication cycle. The reception node checks an increment of the sequence number and detects a malfunction when the sequence number is not incremented. The sequence number is used to confirm an application malfunction in the transmission node. An item “self-diagnosis malfunction” is used as follows. Each of the nodes performs self-diagnosis as to whether the node itself has a malfunction or not and transmits a result of the diagnosis (self-diagnosis result) to the other nodes. The reception node detects a malfunction in the transmission node by using the self-diagnosis result. When any of the fault-monitoring items indicates a malfunction, a fault-monitoring item that unifies the other fault-monitoring items may be used to notify “malfunction detected.”
The fault-monitoring process is performed at p-communication cycle, where p denotes 1, 2, 3, . . . . The p-communication cycle is used as a unit of the period. The nodes are synchronized with each other during a fault-monitoring period at the p-communication cycle. To activate the synchronization, a node may declare initiation of the fault-monitoring process using the communication. Alternatively, the number of communication cycles may be used to determine the monitoring period. For example, if the first fault monitoring is defined to begin with communication cycle 0, the beginning of each fault-monitoring period can be found from no residual in the calculation of dividing the number of communication cycles by p. The use of multiple communication cycles for the period of the fault-monitoring process can decrease the frequency of subsequent processes and reduce a communication band per communication cycle and a processing load on the CPU in each node.
At Step 22, the transmission/reception section 142-i performs a fault-monitoring result exchange process (EXD) that exchanges the fault-monitoring result acquired at Step 21 among the nodes. Each node holds the fault-monitoring results from all the nodes including the result of the node itself. The collected fault-monitoring results are stored in the monitoring result table of the fault-identification result 145-i.
The fault-monitoring result exchange process may be performed at one communication cycle or divided into multiple communication cycles. Performing the fault-monitoring result exchange process over multiple communication cycles reduces a necessary communication band per communication cycle and a load for processing received data on the CPU of each node.
At Step 23, the fault-identification section 143-i performs a fault-identification process (ID) that determines whether each node and each fault-monitoring item has a malfunction or not from the fault-monitoring results collected in the nodes at Step 22. A fault-identification result is stored in the fault-identification result table of the fault-identification result 145-i.
One of the fault-identification methods is a majority rule, which decides whether a malfunction is occurred or not by the number of nodes. If the number of nodes having detected a fault in a given node or fault-monitoring item is greater than or equal to a threshold value of a fault-identification condition 1, the node subject to the fault detection is assumed to be abnormal. If the number of the nodes is smaller than a threshold value of a fault-identification condition 2, the node having detected the fault is assume to be abnormal. Normally, the threshold value is equivalent to half the number of the collected fault-monitoring results.
A node is assumed to be normal if it detects no fault under the fault-identification condition 1 or it is a node subject to the fault detection under the fault-identification condition 2. In the following description, a fault satisfying the fault-identification condition 1 is referred to as a majority malfunction. A fault satisfying the fault-identification condition 2 is referred to as a minority malfunction.
There is another fault-identification method, which assumes a node subject to fault detection or a fault-monitoring item to be abnormal if at least one node detects a fault.
The fault-identification process may be performed at one communication cycle or divided into multiple communication cycles. Performing the fault-monitoring result exchange process over multiple communication cycles can reduce a load for CPU processing per communication cycle in each node.
At Step 24, each node performs a fault-identification result utilization process. If a malfunction is determined at Step 23, the counter section 144-i increments an error counter value that indicates the number of errors in the node or the monitoring item subject to the fault identification. If no malfunction is determined, the counter section 144-i decrements the counter value. The counter section 144-i may reset the counter value or do nothing instead of decrementing. Setting whether to decrement, to reset, or to do nothing is selected in advance. The error counter may be provided for each of the fault-identification conditions. In this case, the error counter is decremented or reset if none of the fault-identification conditions is satisfied.
If the number of errors reaches the specified threshold value or more, the counter section 144-i notifies a control application of a fault occurrence. One of the notification means is to turn on a node fault flag corresponding to the node or the monitoring item subject to the fault identification. The application, referring to the node fault flag, can identify the fault occurrence. The fault occurrence may be immediately notified by interrupting the control application or invoking a callback function after the node fault flag is turned on. When the error counter is provided for each of the fault-identification conditions, the node fault flag is also provided for each of the fault-identification conditions.
When the fault-identification process is divided into multiple communication cycles, the fault-identification result utilization process may be performed when all the fault-identification processes are completed or when part of the fault-identification processes is completed to make sequential use of the results of the part of the fault-identification processes. The former should be employed if all nodes need to maintain the same recognition on the fault occurrence or the same state transition according to the fault occurrence.
The above-mentioned processes can locate the fault occurrence with high reliability and provide the nodes with the same recognition on the error occurrence. In this case, distributing the processes into multiple communication cycles can reduce a load for CPU processing or suppress necessary communication bands per communication cycle.
The processes in
In
The nodes concurrently perform the fault-identification round 1 and a fault-identification round 2 and later. At the communication cycles i+2 and i+3, the fault-monitoring result exchange (EXD) is performed for the fault-identification round 1. At the same time, the fault monitoring (MON) is performed for the fault-identification round 2 according to the content of the received data or the situation of the data reception resulting from the fault-monitoring result exchange (EXD). Similarly, the fault monitoring (MON) is performed for a fault-identification round 3 simultaneously with the fault-monitoring result exchange (EXD) for the fault-identification round 2. The fault identification (ID) is performed between whiles. These processes are repeated subsequently. Results of the fault identification (ID) may be used from the nodes 1 and 2 first or from all the nodes after acquisition of results from the nodes 3 and 4.
In
The nodes concurrently perform the fault-identification round 1 and the fault-identification round 2 and later. At the communication cycles i+2 and i+3, the fault-monitoring result exchange (EXD) is performed for the fault-identification round 1. At the same time, the fault monitoring (MON) is performed for the fault-identification round 2 according to the content of the received data or the situation of the data reception resulting from the fault-monitoring result exchange (EXD). Relation between the fault-identification rounds 2 and 3 is the same and the above-mentioned processes are repeated subsequently.
As shown in
While
For example, in a case of six nodes in
Distribution of the fault-monitoring result exchange (EXD) and the fault identification (ID) to communication cycles is hereafter referred to as time-base process distribution. The time-base process distribution is preferable when a load for CPU processing and a quantity of communication are equal for each communication cycle. This is because the control application is relatively less affected by resources such as the CPU throughput and the communication band.
As for the time-base process distribution, the processes are distributed into the nodes for fault monitoring, the nodes for fault identification, and the nodes for transmission in
Transmission data includes bits for two nodes, each of the bits indicating the presence or absence of a malfunction concerning a node to be monitored. An area corresponding to a given node stores a result of diagnosis about the given node. The presence or absence of a malfunction concerning the nodes 1 and 2 are stored at an even-numbered cycle. The presence or absence of a malfunction concerning the nodes 3 and 4 are stored at an odd-numbered cycle.
The transmission data also includes an error counter value, which each node has, of one node. At the communication cycles i and i+1, the node 1 transmits an error counter value for the node 2; the node 2 transmits an error counter value for the node 3; the node 3 transmits an error counter value for the node 4; the node 4 transmits an error counter value for the node 1. At the communication cycles i+2 and i+3, the node 1 transmits an error counter value for the node 4; the node 2 transmits an error counter value for the node 1; the node 3 transmits an error counter value for the node 2; the node 4 transmits an error counter value for the node 2. The nodes are rotated. Error counters are independent for a majority malfunction and a minority malfunction. The number of majority malfunctions (EC) is transmitted at an even-numbered cycle. The number of minority malfunctions (FC) is transmitted at an odd-numbered cycle.
When receiving an error counter value, the node uses the received error counter value to synchronize the error counters between nodes in the fault-identification result utilization process before the node reflects the result of the fault identification (ID) on the error counter. This is because the error counter values may differ from node to node even when the fault-identification process is performed in accordance with the inter-node monitoring. Possible causes of this difference include a reset based on diagnosis on the node itself or temporal communication failure. The error counters may be synchronized as follows. If a node receives a counter value that differs from the counter value that the node has and a difference of successively received two counter values is within a given value (e.g., ±1), the counter value for the node is adjusted to the later received counter value.
The transmission data indicates only part of the contents. The transmission data may include a sequence number and control data as well as the above-mentioned data.
At communication cycle i, where i is an even number, the nodes 1 to 4 sequentially use slots 1 to 4 to transmit fault-monitoring results (EXD, 501-0 to 504-0) concerning the nodes 1 and 2 for a fault-identification round k−1, maintaining results received from the other nodes and generated from the node itself (521-0 to 524-0 represented in binary). Since the results include no data indicating “abnormality” and are normally received by the nodes, no malfunction is found in the fault identification (ID) concerning the nodes 1 and 2 for the fault-identification round k−1 and none of the nodes turns on the node fault flag (551-0 to 554-0 represented in binary). None of the nodes detects a fault during the fault monitoring (MON) for a fault-identification round k (511-0 to 514-0 represented in binary). The error counter value for each node indicates 2 corresponding to the majority malfunction of the node 3 and indicates 0 otherwise. No change is made from a communication cycle i−1 (541-0 to 544-0).
However, the node 3 causes a CPU fault at the end of the communication cycle i. It is assumed that this fault leads to a fault that disables the node 3 from incrementing a sequence number to be transmitted at the next communication cycle i+1. The sequence numbers are not shown in the data in
At the communication cycle i+1, the nodes transmit fault-monitoring results (501-1 to 504-1) concerning the nodes 3 and 4 for the fault-identification round k−1 and maintain the results (521-1 to 524-1). Similarly to the communication cycle i, no malfunction is found in the fault identification (ID) concerning the nodes 3 and 4 for the fault-identification round k−1 and the error counter (541-0 to 544-0) and the node fault flag (551-1 to 554-1) are the same as those for the communication cycle i. However, the nodes 1, 2, and 4 detect a fault in the node 3 (511-1, 512-1, and 514-1) according to sequence number malfunctions of the node 3 during the fault monitoring (MON) concerning the nodes 3 and 4 for the fault-identification round k. The node 3 cannot detect a malfunction in the node 3 itself (513-1).
The fault-identification result exchange (EXD) and the fault identification (ID) for the fault-identification round k and the fault monitoring (MON) for the fault-identification round k+1 are performed on the nodes 1 and 2 at the communication cycle i+2 and on the nodes 3 and 4 at the communication cycle i+3. No malfunction is detected at a communication i+2 similarly to the communication cycle i. At the communication cycle i+3, the fault-monitoring result exchange (EXD) for the fault-identification round k exchanges fault detection results of the node 3 at the communication cycle i+i (501-3 to 504-3 and 521-3 to 524-3) and the fault identification (ID) for each node identifies a majority malfunction of the node 3 (531-3 to 534-3). As a result, the error counter value of each node concerning the majority malfunction of the node 3 is incremented to 3 (541-3 to 544-3). The threshold value in the system is 3 for the application to notify a fault; therefore the node fault flag of each node concerning the majority malfunction of the node 3 is turned on (551-3 to 554-3).
As mentioned above, the CPU fault of the node 3 is identified by each node and the corresponding node fault flag notifies the application of the fault. The fault-identification process according to the inter-node monitoring in
The fault monitoring (MON) at Step 21 and the fault-monitoring result exchange process (EXD1 in
At Step 61, the fault-identification section 142-i performs the fault-identification process (ID1) on one node of nodes involved in mutual monitoring except the node itself. The node itself is responsible for the one node about the fault identification. The relevant nodes rotate for every communication cycle so as not to conflict with each other. In this manner, a load on the fault-identification process is distributed among the nodes and reduced.
At Step 62, the transmission/reception section 142-i performs a fault-identification result exchange process (EXD2) that exchanges a fault-identification result about the one node acquired at Step 61 among the nodes. Consequently, each node maintains fault-identification results about all the nodes including the result processed by the node itself. At Step 63, a fault-identification process (ID2) uses the collected fault-identification results to settle a final fault-identification result.
Step 24 is the same as that of the fault-identification result utilization process in
The fault-identification process (ID1) may use the fault-identification condition 1 for determination about one node. The fault-identification process (ID2) may use the fault-identification condition 2 for determination about all nodes. Alternatively, the fault-identification process (ID2) may use the fault-identification condition 2 for determination about one node. This result may be exchanged among the nodes using a fault-identification result exchange process (EXD3).
The fault-identification process (ID1) may be performed on two or more nodes, not limited to one node.
In
The nodes concurrently perform the fault-identification round 1 and the fault-identification round 2 and later. At the communication cycles i+2 and i+3, the fault-monitoring result exchange (EXD1) is performed for the fault-identification round 1. At the same time, the fault monitoring (MON) is performed for the fault-identification round 2 according to the content of the received data or the situation of the data reception. At the communication cycle i+4, the fault-identification result exchange (EXD2) is performed for the fault-identification round 1. At the same time, the fault-monitoring result exchange (EXD1) is performed on the nodes 1 and 2 for the fault-identification round 2. The fault monitoring (MON) is performed for the fault-identification round 3 according to the content of the received data or the situation of the data reception. Relation between the fault-identification rounds 2 and later is the same and the above-mentioned processes are repeated subsequently.
In
The remaining process is performed at the communication cycle i+3. The nodes perform the fault-identification result exchange (EXD2) and the fault identification (ID2) concerning a majority malfunction at the communication cycle i+4 and a minority malfunction at the communication cycle i+5. As shown in
The nodes concurrently perform the fault-identification round 1 and the fault-identification round 2 and later. At the communication cycles i+2 and i+3, the fault-monitoring result exchange (EXD1) is performed for the fault-identification round 1. At the same time, the fault monitoring (MON) is performed for the fault-identification round 2. At the communication cycles i+4 and i+5, the fault-identification result exchange (EXD2) is performed for the fault-identification round 1. At the same time, the fault-monitoring result exchange (EXD1) is performed for the fault-identification round 2. The fault monitoring (MON) is performed for the fault-identification round 3 as well. Relation between the fault-identification rounds 2 and later is the same and the above-mentioned processes are repeated subsequently.
A result of the fault identification (ID1) is reflected on the error counter value. The fault-identification result exchange (EXD2) includes increase or decrease of the error counter value in accordance with the result of the fault identification (ID1), transmission of the error counter value, and transmission of the counter value for error counter synchronization. When receiving the error counter value, the node synchronizes the error counter. An example of the method for synchronizing is as follows. (1) If a difference between the received counter value and the counter value of the node itself is a specified value (e.g., ±1), the node itself adjusts the counter value to the received counter value. (2) If the condition (1) is not satisfied and a difference between successively received two counter values is a specified value (e.g., ±1), the node itself adjusts the counter value to the later received counter value.
The transmission data may include an area exclusively used for results of the fault identification (ID1) without reflecting the result of the fault identification (ID1) on the error counter value.
At the communication cycles i and i+1, where i is an even number, the nodes 1 to 4 sequentially use slots 1 to 4 to transmit fault-monitoring results (EXD1, 901-0 to 904-0 and 901-1 to 904-1) for the fault-identification round k−1, maintaining results received from the other nodes and generated from the node itself (921-0 to 924-0 and 921-1 to 924-1). At the communication cycle i, the nodes 1 and 2 transmit fault-monitoring results concerning the nodes 1 and 2; the nodes 3 and 4 transmit fault-monitoring results concerning the nodes 3 and 4. At the communication cycle i+1, the nodes transmit the remaining data. Since the results include no data indicating “abnormality” and are normally received by the nodes, no malfunction is found in the fault identification (ID) for the fault-identification round k−1. The fault identification (ID) is divided into the communication cycles i and i+1 and generates a result at the communication cycle i+1 (931-1 to 934-1, which represent node numbers in charge). None of the nodes turns on the node fault flag (951-0 to 954-0 and 951-1 to 954-1). The fault-identification result exchange (EXD2) and the fault identification (ID2) for the fault-identification round k−2 are also performed. The error counter values are set to 2 for the majority malfunction of the node 3 and to 0 otherwise. The error counter value for each node indicates 2 corresponding to the majority malfunction of the node 3 and indicates 0 otherwise. No change is made from a communication cycle i−1 (941-0 to 944-0 and 941-1 to 944-1).
The fault monitoring (MON) is performed for the fault-identification round k in parallel with the fault-monitoring result exchange (EXD1) for the fault-identification round k−1. During the fault monitoring (MON), each node detects no fault at the communication cycle i (911-0 to 914-0). The node 3 is subject to a CPU fault at the end of the communication cycle i and causes a sequence number malfunction. At the communication cycle i+1, the nodes 1, 2, and 4 detect the fault of the node 3 (911-1 to 914-1).
At the communication cycles i+2 and i+3, the fault-monitoring result exchange (EXD1, 901-2 to 904-2 and 901-3 to 904-3) for the fault-identification round k is performed similarly to the fault-identification round k−1. Each node acquires the fault-monitoring results including the fault detection from the node 3 at the communication cycle i+1 (921-2 to 924-2 and 921-3 to 924-3). The fault identification (ID1) for the fault-identification round k is also performed similarly to the fault-identification round k−1. At the communication cycle i+3, the node 1 in charge of the node 3 identifies the majority malfunction of the node 3 (931-3 to 934-3). All the nodes detect no fault during the concurrently performed fault monitoring (MON) for the fault-identification round k+1 (911-2 to 914-2 and 911-3 to 914-3). While the fault-identification result exchange (EXD2) and the fault identification (ID2) for the fault-identification round k−1 are also concurrently performed, no change is made to the error counters (941-2 to 944-2 and 941-3 to 944-3) and the node fault flags (951-2 to 954-2 and 951-3 to 954-3).
At the communication cycles i+4 and i+5, the fault-identification result exchange (EXD2) and the fault identification (ID2) for the fault-identification round k are performed in parallel with the fault monitoring (MON) for the fault-identification round k+2 and the fault-monitoring result exchange (EXD1) for the fault-identification round k+1. The majority malfunction of the node 3 detected by the node 1 is transmitted to the other nodes (901-4). The nodes recognize the majority malfunction of the node 3 and increment the corresponding error counter value to 3 at the communication i+5 (941-5 to 944-5). Consequently, the nodes turn on the node fault flag corresponding to the majority malfunction of the node 3 (951-5 to 954-5).
As mentioned above, the CPU fault of the node 3 is identified by each node and the corresponding node fault flag notifies the application of the fault. The fault-identification process according to the inter-node monitoring in
In the above-mentioned examples, the periods (communication cycles) are constant for the fault-monitoring process (MON) and for the fault-monitoring result exchanges (EXD and EXD1) and the fault identifications (ID, ID1, and ID2) that are divided and performed. These periods can be changed while the system is in operation. In other words, it is also possible to use a variable cycle for performing the fault identification based on the mutual monitoring.
One method of changing the cycle of the fault identification is, when a fault occurs in a node, to shorten the cycles of the processes associated with the fault identification for the node. The method is based on the principle that the node subject to a fault needs to perform the fault identification in a short cycle. The cycle may be changed when the error counter value becomes greater than or equal to a specified value. This is because the error counter has synchronization means and can synchronize a timing to change the cycle among the nodes.
Even if the fault-monitoring result exchange (EXD) is performed for three cycles or longer, the processes of the fault-monitoring result exchange (EXD) and the fault identification (ID) are moved up and performed as shown in
In
The communication cycles i to i+3 are almost same as those shown in
The error counter value concerning the majority malfunction of the node 3 is set to 1 at the communication cycle i+3. The nodes then change the fault-identification cycle for the node 3 from 2 to 1. A fault of the node 3 detected at the communication cycles i+2 and i+3 (1211-2 to 1214-2 and 1211-3 to 1214-3) is used for the fault-monitoring result exchange (EXD) at the communication cycle i+4 (an OR operation is carried out to regard the faults at the communication cycles i+2 and i+3 as one fault). A fault of the node 3 detected at the communication cycle i+4 (1211-4 to 1214-4) is used for the fault-monitoring result exchange (EXD) at the communication cycle i+5. Assuming that a fault-identification round corresponds to the communication cycles i and i+1 is the fault-identification round 1, the round 2 corresponds to the communication cycles i+2 and i+3; the round 3 corresponds to the communication cycle i+4. The corresponding fault identifications (ID) are performed at the communication cycles i+3 (1231-3 to 1234-3), i+4 (1231-4 to 1234-4), and i+5 (1231-5 to 1234-5), respectively. The error counter values for the nodes corresponding to majority malfunctions of the node 3 are incremented (1241-3 to 1244-3, 1241-4 to 1244-4, and 1241-5 to 1244-5). The counter value is set to 3 at the communication cycle i+5. The node fault flag corresponding to the majority malfunction of the node 3 is turned on (1245-1 to 1245-5).
As mentioned above, the CPU fault of the node 3 is identified by each node and the corresponding node fault flag notifies the application of the fault. The fault-identification process according to the inter-node monitoring in
Control systems using the distributed system are applied to a wide range of industrial fields, such as vehicles, construction equipment, and factory automation (FA). The present invention can ensure high system reliability and improve availability based on backup control for the distributed control systems.
According to the present invention, the distributed systems can be controlled at low cost without additional special apparatus.
Number | Date | Country | Kind |
---|---|---|---|
2008-168052 | Jun 2008 | JP | national |