Information
-
Patent Application
-
20230297490
-
Publication Number
20230297490
-
Date Filed
March 21, 20222 years ago
-
Date Published
September 21, 2023a year ago
-
Inventors
-
Original Assignees
-
CPC
-
-
International Classifications
Abstract
Localizing a faulty microservice in a microservice architecture is achieved by developing healthy execution sequence data for comparison to execution sequences during system failures. Oftentimes the faulty microservice does not emit a failure signal. Frequent sub-sequences arising from log template time series data during healthy execution facilitates localization of faulty services when there is no failure signal from the faulty service.
Claims
- 1. A computer-implemented method for localizing faults, the method comprising:
monitoring runtime execution of an application for an occurrence of a request failure, the application communicating with a plurality of resources within a distributed computing system;building a causal graph using erroneous logs generated during a timeframe including the request failure;identifying real-time execution sequences during the timeframe based on paths from a gateway node to a set of leaf nodes according to the causal graph;establishing a set of frequent execution sub-sequences arising during normal operation of the application based on a log template time series dataset from normal execution logs; andidentifying a missing resource of the plurality of resources by analyzing a real-time execution sequence with respect to a matching frequent sub-sequence.
- 2. The method of claim 1, further comprising:
collecting the normal execution logs from the application associated with a plurality of resources; andgenerating the log template time series dataset from the normal execution logs.
- 3. The method of claim 1, wherein identifying the missing microservice:
comparing the real-time execution sequences to the set of frequent sub-sequences arising during normal execution.
- 4. The method of claim 1, wherein the resources include microservices.
- 5. The method of claim 1, wherein the set of frequent sub-sequences are labeled individually with a corresponding type of execution flow.
- 6. The method of claim 1, wherein the missing resource is a faulty microservice.
- 7. A computer program product comprising a computer-readable storage medium having a set of instructions stored therein which, when executed by a processor, causes the processor to localize faults by:
monitoring runtime execution of an application for an occurrence of a request failure, the application communicating with a plurality of resources within a distributed computing system;building a causal graph using erroneous logs generated during a timeframe including the request failure;identifying real-time execution sequences during the timeframe based on paths from a gateway node to a set of leaf nodes according to the causal graph;establishing a set of frequent execution sub-sequences arising during normal operation of the application based on a log template time series dataset from normal execution logs; andidentifying a missing resource of the plurality of resources by analyzing a real-time execution sequence with respect to a matching frequent sub-sequence.
- 8. The computer program product of claim 7, further causing the processor set to localize faults by:
collecting the normal execution logs from the application associated with a plurality of resources; andgenerating the log template time series dataset from the normal execution logs.
- 9. The computer program product of claim 7, wherein identifying the missing microservice:
comparing the real-time execution sequences to the set of frequent sub-sequences arising during normal execution.
- 10. The computer program product of claim 7, wherein the resources include microservices.
- 11. The computer program product of claim 7, wherein the set of frequent sub-sequences are labeled individually with a corresponding type of execution flow.
- 12. The computer program product of claim 7, wherein the missing resource is a faulty microservice.
- 13. A computer system for localizing faults, the computer system comprising:
a processor set; anda computer readable storage medium having program instructions stored therein; wherein:
the processor set executes the program instructions that cause the processor set to localize faults by:
monitoring runtime execution of an application for an occurrence of a request failure, the application communicating with a plurality of resources within a distributed computing system;building a causal graph using erroneous logs generated during a timeframe including the request failure;identifying real-time execution sequences during the timeframe based on paths from a gateway node to a set of leaf nodes according to the causal graph;establishing a set of frequent execution sub-sequences arising during normal operation of the application based on a log template time series dataset from normal execution logs; andidentifying a missing resource of the plurality of resources by analyzing a real-time execution sequence with respect to a matching frequent sub-sequence.
- 14. The computer system of claim 13, further causing the processor set to localize faults by:
collecting the normal execution logs from the application associated with a plurality of resources; andgenerating the log template time series dataset from the normal execution logs.
- 15. The computer system of claim 13, wherein identifying the missing microservice:
comparing the real-time execution sequences to the set of frequent sub-sequences arising during normal execution.
- 16. The computer system of claim 13, wherein the resources include microservices.
- 17. The computer system of claim 13, wherein the set of frequent sub-sequences are labeled individually with a corresponding type of execution flow.
- 18. The computer system of claim 13, wherein the missing resource is a faulty microservice.
- 19. A computer-implemented method comprising:
determining a system fault has occurred in a computing system;mining normal execution sequences collected during normal operation of the computing system;building a causal graph using erroneous logs generated during the occurrence of the system fault;selecting real-time sequences from the causal graph; andidentifying a missing resource in the real-time sequences by comparing the normal execution sequences to the real-time sequences.
- 20. The method of claim 19, further comprising:
generating a log-template timeseries dataset from the normal execution sequences; andidentifying a set of frequently-arising sub-sequences based on the log-template timeseries dataset;wherein the step of comparing the normal execution sequences is performed by comparing the set of frequently-arising sub-sequences with the real-time sequences.
- 21. The method of claim 19, further comprising:
labeling the frequently-arising sub-sequences individually based on a type of execution flow represented by the sub-sequence.
- 22. The method of claim 19, wherein the step of mining normal execution sequences is performed automatically in response to detecting the system fault.
- 23. The method of claim 19, further comprising:
adding the missing resource to a localization set for system fault resolution.