Described below is a method and a device for analyzing events which occur in a system, in particular an electronic system having system components which internally communicate with one another via a common database.
Systems, in particular electronic systems, may have a multiplicity of different system components. These system components may include, on the one hand, hardware components and, on the other hand, software components. Furthermore, system components may also be hardware components on which software is implemented. In safety-critical systems in particular, faulty system components are generally immediately disconnected if a fault occurs. However, the immediate disconnection of such system components results in a loss of data needed to analyze and narrow down the causes of the fault. If faults occur in a safety-critical electronic system, the entire faulty system or at least the affected system components is/are immediately disconnected in many applications. If the affected system has a redundant design and if a fault which can be assigned to one system component and can be restricted to the latter is detected, the affected faulty system component is disconnected and the affected system component is then either restarted in order to eliminate the fault and to test the system component and to change it to a defined state or the affected faulty system component is replaced with a functionally equivalent redundant system component of the electronic system. In both cases, a large portion of the required data, such as events or system states which resulted in the disconnection of the entire faulty system or at least the faulty system component, is lost after the disconnection and is no longer available for the purpose of analyzing and narrowing down the causes of the fault.
During the operation of an electronic system, important events and system states of the electronic system are logged in many known electronic systems, the logged events and system states or data subsequently being intended to provide information relating to possible causes of a fault. Examples of known electronic systems are so-called black boxes in aircraft or rail vehicles or so-called event logs on Microsoft Windows systems or system logs on UNIX systems. For reasons of space, such systems store only a selection of temporal data in a data window, for example the most recent N data records. Furthermore, in known systems, only those data which are suitable for documenting faults considered by a system developer of the system before use of the system are stored in a data memory. Therefore, maintenance engineers, for example, cannot analyze events which result or resulted in the failure of system functions if the possibility of the occurrence of a corresponding fault was not considered by the system developer during system development or the stored data are outside the relevant data window. Only the data or data records which have been recorded and are still available are available for analyzing a fault if the data memory itself is not affected by a fault. Therefore, it is not possible to check temporary system states of a system or system component which has been immediately disconnected in the event of a fault in known systems.
Therefore, described below are a method and a device for analyzing events, which method allows the fault which has occurred to be analyzed with respect to its cause even after the affected system components have been disconnected.
Described below is a method for analyzing events which occur in a system having system components which internally communicate with one another via a common database and are connected to a system environment of the system via a first interface of the system. In performing the method, a system component of the system is isolated from the system environment if an integrity component of the system detects the occurrence of a particular event in the system component. Then, the integrity component transfers control of the isolated system component to an analysis component of the system, which establishes a communication connection to an external analysis unit via a second interface of the system. Finally, the event which has occurred in the isolated system component is analyzed by the external analysis unit using the component data relating to the isolated system component which are stored in the common database of the system.
The system states and events recorded at the time at which a fault occurs are therefore retained using the method. As a result, the entire faulty system or at least the affected faulty system component continues to be available for analyses.
The method can be used during system development to test the system or to search for causes of faults as part of fault debugging. Furthermore, the method can be carried out while the system is being used in the field, that is to say during operative use of the system.
In one possible embodiment of the method, the analysis component of the system provides the external analysis unit with the component data relating to the isolated system component which are stored in the common database of the system via the communication connection which has been established for the purpose of analyzing the event which has occurred in the isolated system component.
In another possible embodiment of the method, the external analysis unit deactivates the isolated system component after the event which has occurred in the isolated system component has been analyzed.
In another possible embodiment of the method, the analysis component then writes definable component data relating to the affected system component to the common database of the system.
In another possible embodiment of the method, the external analysis unit causes the entire system or the affected system component to be restarted after the definable component data have been written to the common database of the system.
In another possible embodiment of the method, each system component of the system stores a data copy of the component data relating to all system components of the system, which component data are stored in the common database.
In another possible embodiment of the method, the integrity component continuously monitors the occurrence of an event in a system component of the system on the basis of the component data stored in the common database of the system.
In another possible embodiment of the method, the integrity component, if a particular event occurs in a system component of the system, isolates this system component from the system environment.
In another possible embodiment of the method, the integrity component keeps the isolated system component active, if possible, at least until analysis of the event which has occurred in the system component has been completed by the external analysis unit.
In another possible embodiment of the method, a system component of the system carries out write access only to its own component data relating to the respective system component inside the common database of the system.
In another possible embodiment of the method, a test component implemented in the system carries out both write access and read access to the component data relating to all system components of the system, which component data are stored in the common database of the system.
In another possible embodiment of the method, the analysis component of the system uses the test component of the system to carry out write and read access to component data relating to system components of the system, which component data are stored in the common database of the system.
In another possible embodiment of the method, the test component present in the system has a communication connection to an external test unit via the second interface of the system.
In another possible embodiment of the method, the test component, as a system component of the system, deliberately causes events in one or more system components of the system, which events are detected by the integrity component of the system.
In another possible embodiment of the method, the system components of the system control and/or monitor external components of the system environment of the system.
In another possible embodiment of the method, the external components of the system environment of the system have actuators and/or sensors which are connected to the first interface(s) of the system via a network and are controlled and/or monitored by system components of the system.
In another possible embodiment of the method, at least some of the system components of the system, including the integrity component, the analysis component and the test component, are software components which are implemented on one or more processor cores of the system.
In another possible embodiment of the method, the integrity component detects the occurrence of an event in a system component if deviations of the stored component data from predefined desired values occur, if limit or threshold values are exceeded or if inconsistencies occur.
In another possible embodiment of the method, the first interface of the system is formed by a network interface to a network of the system environment of the system.
In another possible embodiment of the method, the second interface of the system is formed by an interface, in particular a wireless interface, to the local or remote analysis unit and/or test unit.
The system, in particular an electronic system, has system components which internally communicate with one another via a common database and are connected to a system environment of the system via at least one first interface of the system. In particular, the system has an integrity component which isolates a system component of the system from the system environment of the system as soon as the integrity component of the system detects the occurrence of a particular event in the respective system component of the system, and an analysis component to which the integrity component transfers control of the isolated system component, whereupon the analysis component establishes a communication connection to an external analysis unit via a second interface of the system, which analysis unit analyzes the event which has occurred in the isolated system component using component data stored in the common database of the system.
In another possible embodiment of the system, the system components of the system are present in redundant form in the respective system.
In another possible embodiment of the system, the system is a distributed system.
In another possible embodiment of the system, the system is a real-time system.
In another possible embodiment of the system, the system environment of the system has a network which connects actuators and/or sensors to the first interface of the system.
In another possible embodiment of the system, the first interface of the system is a network interface to a network of the system environment.
In another possible embodiment of the system, the second interface of the system to the analysis unit and/or test unit is a wireless interface, in particular a mobile radio interface.
In another possible embodiment of the system, the system has a plurality of processors each having a plurality of processor cores, software components which are monitored by an integrity component being implemented on the processor cores.
Also described below is a vehicle, in particular a road vehicle, a rail vehicle or an aircraft, having at least one system, in particular an electronic system, having system components which internally communicate with one another via a common database and are connected to a system environment of the system via at least one first interface of the system. The system has an integrity component which isolates a system component of the system from the system environment of the system as soon as the integrity component of the system detects the occurrence of a particular event in the respective system component of the system, and an analysis component to which the integrity component transfers the control of the isolated system component, whereupon the analysis component establishes a communication connection to an external analysis unit via a second interface of the system, which analysis unit analyzes the event which has occurred in the isolated system component using component data stored in the common database of the system.
Also described is an automation installation having at least one system which controls actuators of the automation installation and evaluates sensor data provided by sensors of the automation installation.
These and other aspects and advantages will become more apparent and more readily appreciated from the following description of the exemplary embodiments of the method and system, taken in conjunction with the accompanying drawings of which:
Reference will now be made in detail to the preferred embodiments, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout.
With reference to
As illustrated in the flowchart of
In S2, the integrity component will then transfer the control of the isolated system component to an analysis component of the system. This analysis component establishes a communication connection to an external analysis unit via a second interface of the system. The second interface of the system to the external analysis unit may be implemented by a wireless interface in one possible embodiment. This wireless interface is a mobile radio interface, in particular.
In S3 of the method, as illustrated in
The system components of the system include both hardware and software components. The system may have, for example, a plurality of processors each having one or more processor cores, software components which are monitored by an integrity component being implemented on the processor cores. In one possible embodiment, the integrity component detects the occurrence of an event in a system component after detecting deviations of the stored component data relating to the respective system component from predefined desired values. Furthermore, the integrity component can detect the occurrence of an event if limit or threshold values are exceeded or if data inconsistencies occur. If such an event occurs, the integrity component can isolate the affected system component in S1 and can then transfer the control of the isolated system component to an analysis component of the system in S2. This analysis component then establishes a communication connection, for example via a wireless second interface, to the external analysis unit which analyzes the events which have occurred in the system component, for example the occurrence of a deviation of the stored component data from predefined desired values or the exceeding of limit or threshold values, in S3 using the component data relating to the isolated system component which are stored in the common database of the system.
The common database of the system may indicate the state of all system components at a particular time, for example at the time of a clock edge of a clock signal. The internal state of the system and of its system components includes, in particular, variables and signals which were interchanged in the last clock cycle between the system components. Furthermore, the database may also include module states of the system components, including the integrity component and the analysis component. In one possible embodiment, the common database is present as a data copy on all system components. In one possible embodiment, each system component of the system stores a data copy of the component data relating to all system components of the system, which component data are stored in the common database. A system component of the system may carry out write access only to its own component data relating to the respective system component within the common database.
In another possible embodiment of the method, a test component is additionally present or implemented in the system in addition to the integrity component and analysis component. This test component implemented in the system may carry out both write access and read access to the component data relating to all system components of the system, which component data are stored in the common database of the respective system. In one possible embodiment of the method, the analysis component of the system uses the available test component to carry out write and read access to component data relating to system components of the system, which component data are stored in the common database of the system. The test component present in the system may have a communication connection to an external test unit via the second interface of the system, for example a wireless interface, in one possible embodiment. In one possible embodiment of the method, the test component, as a system component of the system, deliberately causes events in one or more system components of the system, which events are detected by the integrity component of the system. Some of the system components of the system, including the integrity component, the analysis component and the possibly present test component, are formed by software components implemented on one or more processor cores of the system. In this case, some of the system components monitor external components of the system environment and may also control the external components. The system environment may have, for example, a network which connects actuators and/or sensors to one or more first interfaces of the system. The different system components of the system may be present in redundant form in one possible embodiment. In addition, the system may be a distributed system. In one possible embodiment, the system is also a real-time system which acquires and evaluates data in real time. The method illustrated in
Furthermore, the method illustrated in
In the exemplary embodiment schematically illustrated in
In one possible embodiment, the system components communicate via a central common database. The system components store component states and events or signals in this central common database. If there is a test component, this can have read and write access to the central common database. As soon as the integrity component detects system faults or a fault in a system component, it isolates the affected system component from the system environment. The integrity component then transfers system control to the analysis component. The analysis component then informs the analysis unit of the system state. The analysis unit also uses the analysis component to transmit component states and events from the central data area or the central database. The analysis unit then decides on the further process, for example whether the faulty system component or even the entire system is switched off or whether a defined state is loaded into the central data area or the central database and the system is restarted.
In one possible embodiment, the analysis component can continuously supply data to the external analysis unit or can transmit data to the analysis unit (logging) if a fault or an event occurs.
In the method, the affected system component(s) is/are isolated after a fault or a particular event occurs but is/are kept active, with the result that further analyses can be carried out on the system, for example by an analysis program or an engineer, or in order to change the faulty behavior of the system and to be able to then reactivate the system. In the method, a central data area or a central database of the system is used for this purpose. This central database is used to decouple system components of the system from one another since communication between the system components takes place only via the central database. Furthermore, component states and component functions of the system components are decoupled by transferring state variables to the central data area or the central database.
In one possible embodiment of the method, there is a specialized test component which can read the central database and can write to this database but is otherwise handled by the system like any other system component. In this manner, the specialized test component and a possibly connected test unit cannot impermissibly influence the system behavior of the system.
The method can be seamlessly combined with known logging techniques. The method can support automatic tests of the system as well as interactive, exploratory testing. The method can also be used in scenarios in which the causes of faults or system behaviors are not known in advance.
In one possible embodiment of the system, the system is integrated in a vehicle. In one possible embodiment, this vehicle is a road or rail vehicle or an aircraft. It is also possible for the system to be provided in an automation installation, the automation installation controlling actuators and evaluating sensor data provided by sensors of the automation installation.
The method or system can be used, for example, in the context of vehicle controllers, in particular in electric vehicles, in particular for the purpose of testing hardware-specific/software-specific non-functional safety services which are intended to be automatically provided for vehicle functions by the redundant central hardware/software platform or the system. In order to detect faults and ensure the availability of the system or electronic system, the central hardware/software platform of the electric vehicle is redundant and monitors and compares the states of redundant channels. This can be carried out for each likewise redundant computer of this hardware/software platform. If, for example, the integrity component of this hardware/software platform determines intolerable inconsistencies or faults, the affected part of the controller or the affected system component is isolated and a redundant system component then undertakes its functions since reliable operation is no longer possible with the faulty control part or the faulty system component. With the method, not only can the behavior of a system component or of the entire system be concomitantly logged until a faulty system component is switched off, but the faulty system component is also isolated and continues to be available to the test system, so that it can be analyzed and possibly even repaired during operational use, for example inside a vehicle.
In one possible embodiment, not only the faulty affected system component of the system but rather the entire system can be isolated in the described manner in the event of a fault. During field operation, that is to say during operational use of the system, the extent to which the system or the system component can be isolated depends on the respective application.
For use in mass-produced vehicles, the method can be used as follows. After a faulty system component or a faulty subsystem has been isolated, the test component independently transmits the system state present at the time of the fault to a data memory which is subsequently analyzed by a vehicle service in a known manner or is transmitted by the vehicle service to an external, e.g., wirelessly connected, test or analysis unit. This test or analysis unit may be installed by the vehicle manufacturer, for example, in order to carry out diagnoses or repairs. In the method, a separate communication connection is available for transmitting data. Furthermore, the test component, either independently or on the instruction of the test unit, can carry out a restart with a defined state and can check whether the subsystem or the affected system component can be used again after the system has been re-initialized.
The method and system are suitable, in particular, for highly available, safety-critical and redundant distributed real-time systems. During development and even after development, these systems impose high demands on the traceability and adjustment of faults and on the analysis of the causes of faults.
However, the method and system are not restricted to use in redundant systems or in vehicles, but rather can be integrated in a wide variety of electronic systems. If the system is not redundant, the system functions of the affected system components are no longer available after disconnection caused by a fault. However, the system state and also the previous system sequence can still be completely analyzed using the method. Under certain circumstances, a system restored by the analysis can even continue its work depending on the type of fault which has occurred.
In another possible embodiment of the method and of the system, the analysis and/or test component and the associated communication connection to the test and/or analysis unit may in turn be redundant. This provides the advantage that the method and system still function even if the test component or analysis component and the associated test and/or analysis unit themselves are faulty.
A description has been provided with particular reference to preferred embodiments thereof and examples, but it will be understood that variations and modifications can be effected within the spirit and scope of the claims which may include the phrase “at least one of A, B and C” as an alternative expression that means one or more of A, B and C may be used, contrary to the holding in Superguide v. DIRECTV, 358 F3d 870, 69 USPQ2d 1865 (Fed. Cir. 2004).
Number | Date | Country | Kind |
---|---|---|---|
10 2013 201 831.2 | Feb 2013 | DE | national |
This application is the U.S. national stage of International Application No. PCT/EP2013/076716, filed Dec. 16, 2013 and claims the benefit thereof. The International Application claims the benefit of German Application No. 10 2013 201 831.2 filed on Feb. 5, 2013, both applications are incorporated by reference herein in their entirety.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2013/076716 | 12/16/2013 | WO | 00 |