Embodiments of the inventive aspects of this disclosure will be best understood with reference to the following detailed description, when read in conjunction with the accompanying drawings, in which:
Disclosed herein are methods and systems for distributed failover in a vehicle network. More specifically, the present disclosure includes processor load shedding to reallocate processing power to applications controlling critical vehicle functions and providing for failover in a vehicle network according to the criticality of the affected vehicle function.
In embodiments of the presently disclosed vehicle control method and system, the components of the system, including sensors, actuators, and controllers, are implemented as nodes in a network or fabric capable of communicating with any of the other nodes. Therefore any computing node with sufficient processing power, as long as it has initiated the appropriate control application in its processor, is capable of controlling any device. These devices effect the system's control functions, including throttle control, transmission, steering, braking, suspension control, electronic door locks, power window control, etc. These nodes are connected to a computing node through a network or fabric.
In case of computing node failure, responsibility for the devices that were controlled by the failed node may be dynamically reassigned to or assumed by another node.
The vehicle network 140 is a packet data network. In one embodiment, the network is formed by a fully redundant switch fabric having dual-ported nodes. Any node connected to the fabric, such as a sensor node 102, 104, 106, 108, 110, 112, 114, actuator node 116, 118, 120, 122, 124, 126, 128, or computing node 130, 132, may communicate with any other node by sending data packets through the fabric along any number of multiple paths. These nodes may transmit data packets using logical addressing. The vehicle network 140, in turn, may be adapted to route the data packets to the correct physical address. This network may be implemented in any fashion as will occur to one of skill in the art, such as a switch fabric, a CAN bus, and so on.
For consistency, the reference numerals of
As shown in
Referring to
Referring to
Referring to
Arbitration field 404 may contain a priority tag 414, packet type identifier 416, a broadcast identifier 418, a hop counter 420, hop identifiers 422, 428-436, an identifier extension bit, 424, a substitute remote request identifier 426, a source node identifier 438, and a remote transmission request identifier 440. The priority tag 414 may be used to ensure that high priority messages are given a clear path to their destination. Such high priority messages could include messages to initiate or terminate failover procedures. The packet type identifier 416 may identify the packet's purpose, such as discovery, information for processing in a control application, device commands, failover information, etc. The broadcast identifier 418 identifies if the packet is a single-destination packet. This bit is always unset for source routing. The hop counter 420 is used in source routing to determine whether the packet has arrived at its destination node. Hop identifiers 422, 428-436 identify the ports to be traversed by the data packet. The source node identifier 428 identifies the source of the packet. The identifier extension bit 424, substitute remote request identifier 426, and remote transmission request identifier 440 are used with CAN messaging.
Referring to
Arbitration field 404 of data packet 401 contains most of the same identifiers as data packet 400. Arbitration field 404 of data packet 401, however, may contain a destination node identifier 442 and a reserved field 444 instead of hop identifiers 422, 428-436. The hop counter 420 is used in destination routing to determine whether the packet has expired.
In some embodiments, the destination node identifier 442 contains logical address information. In such embodiments, the logical address is converted to a physical address by the network. This physical address is used to deliver the data packet to the indicated node. In other embodiments, a physical address is used in the destination node identifier, and each source node is notified of address changes required by computing node reassignment resulting from failover.
Thus, as described in reference to
As discussed above, control functions operated by the vehicle control system 100, may have varying levels of criticality. For the most critical vehicle functions, it is important that interruptions in operation are as short as possible. For non-critical vehicle functions, short interruptions may be acceptable. Vehicle functions of intermediate criticality require a shorter response time than non-critical functions, but do not require the fastest possible response. In some embodiments, therefore, failover methods for control functions are determined according to the criticality of the function, so that the function is restored as quickly as required, but more processing power than necessary is not expended.
In some embodiments, a passive backup may be employed for control functions that have a low criticality.
Upon detecting the failure of the first computing node (block 518), the network initiates a control application in a second computing node (block 504), typically by sending a data packet to the second computing node. The control application (or a reduced version of the application) may be installed on the second computing node at manufacture, may be sent to the second computing node just before the application is initiated, may have a portion of the application installed at manufacture and receive the rest just before initiation, and so on. Detecting the failure may be carried out by the use of a network manager (not shown). In one embodiment, all applications on the nodes send periodic heartbeat messages to the network manager. In another embodiment, all the nodes are adapted to send copies of all outgoing data to a network manager, and the network manager is adapted to initiate the control application in a second computing node upon failure to receive expected messages. The network manager may also poll each application and initiate the control application upon failure to receive an expected response. In some networks, each node may poll its neighboring nodes or otherwise determine their operative status. The nodes may also receive updated neighbor tables and initiate a failover according to configuration changes.
In other embodiments, after sending a message to a first computing node, the message source, such as a sensor node, may initiate the control application in a second computing node upon failure to receive an expected message response from the first computing node. Alternatively, a message destination, such as an actuator node, may initiate the control application upon failure to receive expected messages from the first computing node. The nodes may initiate the application directly or notify a network manager adapted to initiate the application.
Once the control application is initiated, the second computing node instructs the sensor nodes previously transmitting data to the first computing node to instead send the data to a second computing node (block 508). This instruction may be carried out by sending data packets from the second computing node. Instead of the second computing node, in other embodiments the network manager or a node detecting the failure may instruct the sensor nodes to send data to the second computing node. This redirection can occur by many different techniques. In one embodiment the sensor node simply changes the destination node ID of its outgoing data packets. If the destination node ID is a logical value, the network routing tables may be reconfigured to direct packets addressed to that logical node ID to the second computing node rather than the first computing node. In another embodiment, the second computing node adopts the destination node ID of the first computing node as a second node ID, with related changes in network routing. Other techniques will be recognized by those skilled in the art.
Operating in place of the first computing node, the application in the second computing node receives data from one or more sensor nodes (block 512), processes this data from the sensor nodes (block 516), and sends data from the first computing node to an actuator node (block 520).
Upon detecting that the first computing node is operational (block 522), the second computing node instructs the sensor nodes initially sending data to the first computing node to return to transmitting data to the first computing node (block 524) or other rerouting as described above. The second computing node then relinquishes control to the first computing node (block 526), by transmitting a data packet to the first computing node to resume control at a specific time stamp, for example. In other embodiments, the backup may retain the application until the next key off, or other condition, before releasing control back to the operational first computing node.
This failover or backup capability provided by transferring control operations to an existing controller improves failover system efficiency by providing failover capabilities without providing a fully redundant environment.
An active backup may be implemented for control functions with an intermediate criticality level, such as, for example, powertrain function.
In normal operation, the control applications in the first and second computing nodes each receive data from one or more sensor nodes (block 706, 708) and process this data from the sensor nodes (block 710, 712). This dual delivery can be done by the sensor node transmitting two identical packets, except that one is addressed to the first computing node and the other is addressed to the second computing node. Alternatively, one of the switches in the fabric may replicate or mirror the data packets from the sensor node. Again, other techniques will be apparent to those skilled in the art. Thus, the second control application maintains an equal level of state information and may immediately replace the first application if it fails. Only the control application from the first computing node, however, sends data to an actuator node (block 714).
Upon detecting the failure of the first computing node (block 716), the application running in the second computing node assumes the function of the first computing node. The second application may detect the failure by polling, by failure to receive an expected message or by other methods as will occur to those of skill in the art. In other embodiments, detecting the failure may be carried by other nodes or by a network manager as described above. Operating in place of the first computing node, the application in the second computing node sends data from the first computing node to an actuator node (block 718). Upon detecting that the first computing node is operational (block 720), the second computing node relinquishes control to the first computing node (block 722).
For the most critical control functions, such as steering and braking, for example, the system may employ a parallel active backup.
The applications in each of the first and second computing nodes receive data from one or more sensor nodes (block 906, 908), process this data from the sensor nodes (block 910, 912), and send data to an actuator node (block 914, 916). The actuator node is adapted to determine which application is sending control data. Upon detecting the failure of the first computing node, the actuator uses data from the second computing node (block 918), as further illustrated in
The system as described above may be designed with redundant processing power which, when all of the system's components are operating properly, goes unused. As failure occurs, the system draws from this unused processing power for backup applications. This redundant processing power may be assigned according to the priority of the applications. In some embodiments, if the total amount of redundant processing power is not sufficient to keep all control functions operational, the system frees processing power allocated for control functions of lesser criticality and uses this processing power for more critical backup applications.
The vehicle control system may detect insufficient processing capacity by determining that more system resources are required to initiate a backup control application than are available. The system may also find the system resources required by a particular control in a hash table. This data may be pre-determined for the particular application or calculated periodically by network management functions and updated. The system's available processing capacity may be determined according to various network management techniques that are well-known to those of skill in the art.
An application has a higher priority than another application if it is determined that the vehicle function the application controls is more important to the vehicle's operation than the vehicle function of another application, denoted by a priority value indicating a higher priority than the priority value of the other application. A higher or lower priority value may indicate a higher priority depending on the scheme for assigning priority values. Determining priority typically includes performing look-ups of a priority value and comparing the results of the look-up. The hierarchy of applications, as reflected in priority values, may be predetermined and static, or it may be dynamically ascertained according to valuation algorithms immediately prior to load shed or periodically during operation. Dynamically assigning priority values may be carried out in dependence upon vehicle conditions, such as, for example, the speed of the vehicle, the rotational speed of each wheel, wheel orientation, engine status, and so on. In this way, the priority values of the applications may be changed to reflect an application hierarchy conforming to circumstances of vehicle operation.
Processor load shedding may be carried out by terminating lower priority applications until there is sufficient processing capacity to run the backup (block 1108) or restricting a multiplicity of lower priority applications to lower processing demands without terminating those applications (block 1110). Determining when to load shed and which applications will be restricted or terminated may be carried out in the nodes running the affected applications or in a network manager.
Referring again to
This processor load shedding improves efficiency by reducing the amount of unused computing resources provided for failover or backup use. When combined with the failover or backup operations described above, such as the passive backup operations of
While the embodiments discussed herein have been illustrated in terms of sensors producing data and actuators receiving data, those of skill in the art will recognize that an actuator node may also produce and transmit data, such as data regarding the actuator node's status, and a sensor node may also receive data.
It should be understood that the inventive concepts disclosed herein are capable of many modifications. To the extent such modifications fall within the scope of the appended claims and their equivalents, they are intended to be covered by this patent.