This application is a National Stage of International patent application PCT/EP2020/081922, filed on Nov. 12, 2020, which claims priority to foreign French patent application No. FR 1913853, filed on Dec. 6, 2019, the disclosures of which are incorporated by reference in their entirety.
The invention relates to an onboard hardware and/or software architecture made up of software services (or components), interacting with a device that concentrates the data (for example an “oriented service” architecture equipped with a data concentrator or “broker” or “data functional bus”).
Certain functions critical for operating safety are implemented by complex onboard systems implemented in hardware and software form, such as triggering of airbags or automatic emergency braking of a vehicle.
It is particularly difficult and costly to develop and prepare (or even certify) these systems for adequate operating safety, and it is all the more complex and costly when the underlying hardware or software architecture is complex.
To improve the safety and availability of critical functions implemented in software, it is customary to roll out multiple redundant implementations in a mode called “lockstep”, which consists in using at least two physical execution units (for example two computers or CPUs, the acronym for central processing unit, on one and the same system on chip, the acronym for which is SoC) executing exactly the same code at the same instant.
A hardware device integrated in the system on chip immediately detects when the registers, or the memory accesses, of the two cores are different: this signals that a fault has occurred in at least one of the two cores, and it is therefore necessary to switch over to a failsoft mode (for example restart the function, or deactivate it, signal the fault, engage a secondary execution mode, etc.).
However, conventional lockstep is possible only between two computers that are physically close, in order to compare their registers or their memories. The invention allows the use of networked remote computers.
Conventional lockstep requires identical execution units executing exactly the same set of software tasks (so that the executions are identical cycle for cycle, except when there is a fault).
It is an aim of the invention to overcome the aforementioned problems and in particular to provide for operating safety at reduced cost and with reduced complexity.
One proposal, according to one aspect of the invention, is a computer system installed on board a carrier, communicating in a network with a data concentrator and with a monitor, and implementing at least one service that is critical for the operating safety of the carrier, the critical service being redundant in at least two instances on different respective computers connected to said network,
Such a system allows provision for operating safety at reduced cost and with reduced complexity.
In one embodiment, a relay server is configured to compute the signature by way of a hash chain using a cryptographic hash function H, by recurrence for each instance k, in each period (or time step) n, by way of the following relationship:
Thus, the value hnk, is a signature that is characteristic of the history of the input and output data of the critical service since it was started, the volume of data of which is constant, and low (typically 256 bits) in view of all of the input and output data of the critical service since it was started.
According to one embodiment, the monitor is configured to detect a temporal fault when a signature of the instances has not been received before the latest end date of the current period.
Thus, the monitor contributes to detecting the temporal faults of the instances of the critical service.
In one embodiment, the monitor is configured to compare the signatures received from the relay servers in order to detect an operational fault when one signature of the instances is different from the other signatures.
Thus, the monitor detects whether at least one of the instances of the critical service presents an operational fault.
According to one embodiment, the monitor detects a fault when the signatures of the instances are equal but one internal state and/or output data differ(s) from the others.
Thus, when the number of instances of the service is equal to at least three, and fewer than half of these instances are faulty, the monitor is configured to take a majority vote among the signatures received in time providing a majority signature denoting the operational instances, the signatures that are different from the majority signature denoting the faulty instances, and configured to signal to the remainder of the system the operational instances and the faulty instances so that the transmission of the faulty instances is interrupted on the data concentrator.
Thus, the device provides for detection and isolation of the operational and temporal faults of the replicas, and tolerance of breakdowns, which improves the overall availability of the service.
For example, when an instance is detected as faulty in a period nd, the computer hosting the faulty instance is configured to retrieve a copy of a correct internal memory state of another operational instance corresponding to the period nd, to restart the faulty instance from the correct memory state, and to feed back to the faulty instance the input data from the period nd to the current period, by applying the transfer function, possibly behind schedule in relation to the corresponding latest end dates, and the monitor is configured so as, when the faulty instance has caught up with the operational instances, that is to say has applied, before the nth latest end date, the transfer function to the inputs up to the period n, the signatures being equal again, to report the faulty instance as operational again.
Thus, the sequence of signatures allows identification of a faulty instance, identification of an operational instance on the basis of which the faulty instance will be restarted and replayed, and then identification of when the replay is finished and the instance is operational again.
According to one embodiment, a relay server is a software server implemented on the corresponding computer.
Thus, the implementation is simplified.
In one embodiment, a relay server is a hardware server implemented at the data concentrator.
Thus, the proxy is not subject to the risk of fault of the software executive like the instance that it watches over.
According to one embodiment, the system comprises a network that is independent of the data concentrator for transmitting the signatures by way of the relay servers, with a lower passband and higher reliability than those of the data concentrator.
Thus, there is less risk of the comparison of the signatures being compromised by an integrity flaw or temporal fault linked to the transmission between the relay server and the monitor.
Another proposal, according to another aspect of the invention, is a method for managing at least one service that is critical for the operating safety of a computer system installed on board a carrier, the critical service being redundant in at least two instances on different respective computers connected to said network, each implementation of an instance of the critical service using:
The invention will be better understood on studying a few embodiments that are described using nonlimiting examples and are illustrated by the appended drawings, in which
Throughout the figures, elements that have identical references are similar.
A computer system 1 installed on board a carrier communicates in a network with a data concentrator 2 and with a monitor M and implements at least one service that is critical for the operating safety of the carrier, or safety critical, the critical service being redundant, i.e. executed in at least two instances δ1, . . . δm on different respective computers C1, . . . , Cm connected to said network, in this case two replicas on two respective computers.
Each computer C1, . . . , Cm implements an instance δk of the critical service and is configured to implement the critical service by using:
The monitor M detects a fault by analyzing the signatures hn+1k of the instances δ1, . . . δm.
In
The hash function H is a cryptographic hash function, that is to say that, for a message of arbitrary size, it associates a fingerprint h, referred to as being fast to compute, that is resistant to preimage attacks (given a fingerprint h, it is impossible in practice to construct a message m such that H(m)=h), to second preimage attacks (knowing m1, it is impossible in practice to construct a message m2 such that H(m2)=H(m1)) and to collisions (it is impossible in practice to construct two different messages m1 and m2 such that H(m1)=H(m2)).
When a critical service is redundant, multiple instances δk (k=1, 2 . . . m) implement the same transfer function f, but are susceptible to faults. The variables modeling the operation of the instance k are called Xk, and that describing a theoretical flawless instance is called X.
Each instance δk satisfies the same time constraints Rn and Dn, is started in the same initial state s01=s02= . . . =s0 and receives the same inputs(in) before the date Rn (as a result of a multicast message being sent, or multiple sending). Therefore, in nominal mode, all instances compute exactly the same internal state values sn1=sn2= . . . =sn, and produce the same outputs on1=on2= . . . =on, before the latest date Dn of the current period.
The instances are not necessarily executed simultaneously; they may be executed on computers having different frequencies or may also be pre-empted by other tasks. The only necessary assumption is that the nth execution, or nth job, is effectively executed between the activation Rn and current latest Dn dates.
Let us suppose that the instance δ1 has an error, which is activated during the nth job: internal fault sn+11≠sn+1 or external fault on1≠on. The invention allows these faults to be detected as soon as possible, for the purpose of signaling, and, if necessary, for the purpose of triggering a failsoft mode of operation.
The present invention uses the computation of a signature hnk, which is characteristic of the execution of each instance δk from when it is started to the current latest date Dn, and then transmission of these signatures to a monitor M, which compares them in order to detect an error.
The signature hnk, is computed by way of a hash chain hn+1k=H(hnk, ink, onk, sn+1k) in which H is a hash function on nb bits. This computation may be performed by the instance δk. The signature hn+1k is transmitted to the monitor M via the data concentrator 2 before the current latest date Dn. After the current latest date Dn, the monitor M compares all signatures hn+1k. In nominal mode, all signatures are equal.
Supposing that the system has remained in a nominal mode up to the latest date Dn−1, the monitor M detects a fault in the following cases:
A false negative may occur when:
Selecting nb≥α log2(10) reduces this risk to an acceptable probability of ≤10−α. (for example, for a tolerated fault probability of 10−12 per hour of operation, with comparison of signatures in the period Π=10 ms, or
per 10 ms, nb>64 bits is sufficient).
When the monitor M detects a deviation in the received signatures, it may signal this to an operating state management, or health management, device that will be responsible for deactivating the replicas, switching over to a failsoft mode, called FT, the acronym for fault tolerant, mode, or restarting all replicas in a reference state.
Additionally, if more than two replicas of the service are instantiated, the monitor M may determine, by way of a majority vote, the faulty instance(s) and selectively deactivate or restart them.
As illustrated in
The invention allows very effective implementation of the redundancy principle, with fewer constraints than conventional lockstep, with limited network and computational overhead since the signature may be produced by a very short message. This load is reduced again if the signature computation is performed by a hardware accelerator.
As such, the signature generation device described in the patent FR2989488B1 provides an effective implementation of the signature of the execution. In the case of stateless functions (sn=Ø), the signature may be computed by the data concentrator itself.
Number | Date | Country | Kind |
---|---|---|---|
1913853 | Dec 2019 | FR | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2020/081922 | 11/12/2020 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2021/110380 | 6/10/2021 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
9836354 | Potlapally | Dec 2017 | B1 |
20130212441 | Vilela | Aug 2013 | A1 |
20170116089 | Park et al. | Apr 2017 | A1 |
20190235448 | Banginwar et al. | Aug 2019 | A1 |
20200174897 | McNamara | Jun 2020 | A1 |
Number | Date | Country |
---|---|---|
2 989 488 | Oct 2013 | FR |
0146806 | Jun 2001 | WO |
Number | Date | Country | |
---|---|---|---|
20230012925 A1 | Jan 2023 | US |