Currently, in a production environment, applications enable logging when an error occurs. The logs create a trail of data that can be used to identify the cause of the error. The logging typically begins after an error is detected and only includes logs above a certain error threshold. One problem with this is that the root cause of an error can occur before the error is detected, or the root cause may not trigger a high enough error threshold level. For example, an event that causes an error can occur during the execution of one function that affects the execution of subsequent functions. Numerous functions can execute before an error is detected. In such cases, logs created after the error occurs do not include information about what occurred before the error. Also, the application may not even collect a log on the function that caused the error because the application did not detect the error at that function.
This can make it impossible to identify the root cause without further investigation. For example, a representative may have to go back to customer to re-run the application in debug mode and collect logs with logging levels below the error threshold and re-run the application to capture the detailed flow of events. Re-running the application in debug mode has many drawbacks. For example, the representative may not be able to reproduce the issue. Also, the application will collect logs for all events, which takes significant time to parse through. The relevant debug logs can be easily missed due to the large number of logs required for manual review.
Current solutions to the problem have drawbacks as well. For example, some solutions use a buffer where the application maintains a set amount of debug logs in a forward fashion while the application executes. When an error occurs, a representative can go through the buffered logs. However, the application is limited to a predetermined amount of memory space for buffering debug logs, and the oldest logs are deleted as new logs are added. Because of this, relevant logs can be deleted before the error is detected, which in turn requires manual debugging. Additionally, buffering solutions make no distinction between what logs that may be relevant versus those that may not be. As a result, a representative must still parse through the logs to try to identify the relevant logs. This is time consuming and subject to manual errors.
As a result, a need exists for improved application debugging.
Examples described herein include systems and methods for dynamically capturing debug logs during runtime. The methods described herein include the collection of debug logs from functions in the function call path between an entrant function and a function that encountered an error. As used herein, the term “function call path” can refer to an order of functions called to run during runtime of an application. Some programming languages, such as JAVA and PYTHON, include an Application Programming Interface (“API”) for running a stack trace to identify the actual function call path leading up to the error. Some operating systems have built-in stack trace APIs, such as glibc and boost for Linux. Some programming languages do not include such a stack trace API. Examples are described herein for collecting debug logs during runtime for both applications with and application without a stack trace API available.
In an example for applications without a stack trace API available, a function call graph can be created when the application is initiated. A function call graph can provide a detailed relationship between caller and callee functions of the application. For example, the function call graph can detail which functions call one another during runtime. The graph can branch out from an entrant function, which is the first function to run for the application or a thread in the application. The function call graph can be constructed from disassembled code or constructed statically at initialization time.
Additional functionality can be added to each function for obtaining debug data in real-time when a function errors. There are various ways that additional functionality can be added to a function. For example, additional functionality can be added using a wrapper function, an API, or by injecting code. References are made herein to wrapper functions (hereinafter referred to interchangeably with simply “wrapper”) for adding functionality to a function. Those references are merely used as examples and are not intended to be limiting in any way. For simplicity, wrapper functions can encompass any method for adding functionality, such as using an API or injecting code. Wrappers can include runtime commands, post-error commands, or both. One runtime command can cause a function, when called, to store registered variables at entry point and exit point. The value of the registered variables can be different for each function at entry point and exit point. Post-error commands can be triggered when a debug flag is enabled. For example, the wrappers can include a post-error command for logging debug data to a log file, such as the stored registered variables and data about the function's stack frame.
When an error occurs, the application can use the function call graph to identify all the possible function call paths between the entrant function and the errored function. The application can then enable the debug flags for the functions in the possible function paths. Enabling the debug flags can cause the functions to execute the post-error commands in the wrapper. This can include logging data defined by the post-error commands. Once the debug logs are created, the application can disable the debug flags.
Dynamically creating debug logs in this manner provides runtime debug data from only the functions that could have caused the error. It also ensures that the debug log file includes runtime data from the function that initially caused the error. Debug data is not collected from functions that could not have been the root cause of the error because they are not part of the errored function's call path.
In an example for applications that include a stack trace API, similar wrappers as described above can be added to the code for each function of the application for which debug logs may be needed. These wrappers can cause the functions to retain certain data during runtime and collect certain data if the debug flag is enabled. When an error occurs, the application can use the stack trace API. The stack trace API can return the actual function call path that led to the error. The application can then enable debug flags at only the functions in the function call path. Enabling the debug flag can cause those functions to retrieve and log debug data as defined in the wrapper. Once the debug logs are created, the application can disable the debug flags. In one example, the wrapper can include a subfunction that disables the debug flag after the logs are created. For recursive functions, the debug flag can be a counter.
The examples summarized above can each be incorporated into a non-transitory, computer-readable medium having instructions that, when executed by a processor associated with a computing device, cause the processor to perform the stages described. Additionally, the example methods summarized above can each be implemented in a system including, for example, a memory storage and a computing device having a processor that executes instructions to carry out the stages described.
Both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the examples, as claimed.
Reference will now be made in detail to the present examples, including examples illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
The methods described herein relate to collecting debug logs from application functions in the function call path of a function that encountered an error. A function describes the internal behavior of a component needed to realize one or more application services. Each time a function is called in a program, a block of memory called a stack frame (also called an activation frame or activation record) is allocated on top of a call stack. A call stack generally stores the function's arguments and local register variables. The variables allow the call stack to keep track of the point to which each active function should return control when it finishes executing. For example, when a first function calls a second function, first functions' return address is saved in a stack frame. The return address causes execution to leave the first function to the second function. When the second function has completed execution, the return address is used to resume the first function at the point in the code immediately after execution was passed to the second function.
Stack frames can be nested inside each other based on layers of calls that occur before execution is returned. The stack frames of an application thread make up a call stack. A call stack therefore includes the state of each nested function at a point in time. The methods described herein leverage this knowledge retrieving information from the nested stack layers at the point in time that a function in the corresponding thread encounters an error. This provides information about the stack frames during runtime, thereby eliminating the need to reproduce the error.
Some programming languages have built-in support for retrieving a stack trace using system calls. For example, PYTHON's traceback module includes various functions for producing a stack trace. JAVA includes multiple methods for retrieving a stack trace of a thread, including dumpstack() and printstacktrace(). A method is a block of code which only runs when it is called. Linux includes native stack trace API support. However, some programming languages do not have library support for obtaining a stack trace when running in an operating system without a built-in stack trace API.
Methods are described herein for retrieving time debug logs at runtime for both applications that have stack trace capabilities and those that do not.
The function call graph can include a node for each function, and the nodes can include pointers that directionally link functions based on caller/callee relationships between them. The nodes can include a debug flag that indicates where debug logs should be logged. The debug flags can be disabled by default. When an error occurs during execution of the application, the application enable the debug flags on certain nodes, which can cause those nodes to collect debug logs. This is described in more detail later regarding stage 140. The nodes can also include a buffer where each function can mark registered variables at entry point and at exit point. Entry point refers to when the execution of the function begins, and exit point refers to when the execution of the function ends. These variables can be retrieved from the buffer when the debug flag is enabled for the function.
Moving temporarily to
The various nodes are connected by call edges 240. An arrow on the call edges 240 represents the direction of a call. For example, the call edge 240 between the entrant node 210 and fn2() 202 indicates that the entrant function calls the fn2() function. In other words, the entrant function is the caller and the fn2() function is the callee. The call edges 240 can be used to trace the possible function call paths. For example, the thread 1 220 shows that the entrant function calls the fn1() function, which calls the fn3() function, which calls the fn4() function, which calls the fn7() function. The thread 2 230 includes multiple possible calls paths. For example, as shown, in one possible call path the entrant function calls the fn2() function, which calls the fn5() function. Alternatively, the entrant function can call the fn2() function, which calls the fn6() function, which then calls the fn2() function. The fn2() and fn6() functions can continue to call each other until the fn2() function calls the fn5() function.
Returning to
At stage 130, using the function call graph, the application can retrace the possible call paths between the errored function and the entrant function. The application can include a specific function for retracing function call paths. The retracing function can be triggered whenever an error occurs. The retracing function can retrace the possible function call paths using the previously described pointers that directionally link functions based on caller/callee relationships between them.
The application can retrace the call paths using an algorithm, such as a depth-first search (“DFS”) algorithm. A DFS algorithm traverses or searches tree or graph data structures by starting at a root node (e.g., the errored node) and exploring as far as possible along each branch before backtracking.
Moving temporarily to
The function call graph 300 illustrates possible retrace paths 310 when an error occurs at fn5() 205. For example, one possible path can go from fn5() 205 to fn2() 202. Another possible path can go from fn5() 205 to fn2() 202, then to fn6() 206, and back to fn2() 202. Once the retracing function reaches fn2() 202, the possible path can end. So, for example, when an error occurs in any function from the thread 2 230, the retracing function does not need to do any retracing in the thread 1 220. This effectively filters out debug data that would be irrelevant in identifying the root cause of the error, thereby reducing the time required to debug.
Returning to
Post-error commands can execute when a debug flag in the wrapper is enabled. For example, when a function's debug flag is enabled, the function can execute one or more log commands that log the debug data. For example, the functions can log the registered variables that were previously retained and log information about the function's stack frame. The debug data from all the functions in the possible function call paths can log their debug data to the same data file that a user can then access for troubleshooting, such as a .log file. This provides the user with a log file with runtime data about functions in all the possible function call paths leading to the errored function, but the log file does not include any debug data from functions of other threads that would not be relevant to debugging the errored function. Table 2 below includes an example of such a wrapper written in C.
In an example, the retracing function described at stage 130 can enable the debug flags as it identifies the functions in the possible call paths. After the functions log their debug data, the retracing function can disable the debug flags. Returning to
The function call graph can include a node for each function, and the nodes can include pointers that directionally link functions based on caller/callee relationships between them. The nodes can include a debug flag that indicates debug logs that should be logged. The nodes can also include a buffer where each function can mark registered variables at entry point and at exit point of the respective function.
At stage 404, the application can initiate the entrant function. The entrant function can be the first function that runs for the application. The entrant function can then call one or more functions to run. The sequence diagram includes three example call paths that are based on the function call graph 200 from
The function call paths 2A and 2B correspond to possible function call paths in the thread 2 230. In the function call path 2A, the entrant function calls the fn2() function at stage 414, and the fn2() function calls the fn5() function at stage 416. In the function call path 2B, the entrant function calls the fn2() function at stage 418, the fn2() function calls the fn6() function at stage 420, the fn6() function calls the fn2() function at stage 422, and the fn2() function calls the fn5() function at stage 424.
At stage 426, the fn5() function can encounter an error. The error can be any kind of error, such as a runtime error. The application can detect the error by catching an exception. In response, at stage 428, the application can retrace the possible function call paths between the fn5() function and the entrant function using the function call graph. For example, the application can execute a retrace function that traverses the function call graph using a DFS algorithm. The application can retrace only the function call paths that include the fn5() function, which include the function call paths 2A and 2B. The application can ignore the function call path 1 because the fn5() function did not run in that call path.
At stage 430, the application can enable debug flags for all functions in the retraced paths (i.e. the function call paths 2A and 2B). This includes the entrant function, the fn2() function, the fn5() function, and the fn6() function. The functions with the debug flags enabled can then log their corresponding debug data, which can include the state of the function's stack and buffered entry and exit variable values. For example, the entrant function can log its debug data at stage 432, the fn2() function can log its debug data at stage 434, the fn5() function can log its debug data at stage 436, and the fn6() function can log its debug data at stage 438. In some examples, stages 432, 434, 436, and 438 can occur in reverse order. For example, the fn6() function can log its debug data, then the fn5() function can log its debug data, then the fn2() function can log its debug data, then the entrant function can log its debug data. Whether the entrant function logs debug data can depend on whether the application executes multiple threads. For example, if only thread 2 230 executes, then the entrant function can log debug data. On the other hand, if thread 1 220 and thread 2 230 both execute, then the fn2() function can serve as the entrant function for the thread, and stage 432 can be skipped. The debug data can be collected from a memory stack. In particular, each instance of the function can have its own frame in the stack where the variables are maintained. Those variables can be collected as debug data from the respective frame of the memory stack. Once the debug data is logged, the application can disable the debug flags.
At stage 510, the application can detect that a function encountered an error. For example, a function can encounter a runtime error. Some applications include functions specifically for detecting and handling certain types of errors. If the detecting function is unable to resolve the error, then the detecting function throws an exception.
At stage 520, the application can determine the function call path using a stack trace tool or API. The function call path can be an ordered list of function calls that occurred leading up to the function that encountered an error. The application can determine the function call path using a built-in tool provided by the application's programming language. For example, if the application is written in JAVA, then the application can run getstacktrace() or dumpstack() to retrieve an ordered list of all the functions in the call stack for the thread with the error. If the application is written in PYTHON, then the application can run a function from the traceback module, such as traceback.print_stack. If the application is running in Linux or another operating system with a built-in stack trace API, then the application can retrieve the function call path from the stack trace API. The examples of stack trace tools and APIs mentioned herein are merely exemplary and not meant to be limiting in any way. For example, any stack tracing capability can be used that retrieves the function call path of the application between the entrant function and the errored function.
At stage 530, the application can enable debug logging at each function in the function call path. To do this, a wrapper can be added to the code for each function before the application runs. The wrappers can cause their corresponding functions to execute additional commands when the function is called. The wrappers functions can include runtime commands, post-error commands, or both. For example, because registered variables can change during runtime, a wrapper can cause its corresponding function to retain registered variables at entry point and exit point each time the function is called. These variables can then be retrieved if a function in the function call path encounters an error. An example of such a wrapper is included in Table 1 described previously. The wrappers can include a debug flag that causes certain commands in a wrapper to run only when enabled. For example, the wrappers can include a debug flag that, when enabled, cause debug data associated with the function to be logged. The application can therefore enable debug logging by enabling the debug flag in the wrapper for each function in the function call path. An example of such a wrapper is included in Table 2 described previously.
The wrappers can include commands that only execute when the debug flag is enabled. For example, when the debug flag is enabled, the function can log registered variables previously retained, and log the state of the function's stack.
At stage 540, the application can collect debug logs from the functions in the function call path. For example, when the application enables the debug flags, the corresponding functions can execute a subfunction from the wrapper that logs data as defined by the wrapper. As an example, when an error occurs, the application can execute a stack trace to identify functions in the function call path between the entrant function and the errored function. The application can then enable the debug flag at each of the identified functions. When a function's debug flag is enabled, the function can execute one or more log commands that log the debug data. For example, the functions can log registered variables that were previously retained and log information about the function's stack frame. The debug data from all the functions in the possible function call paths can log their debug data to the same data file that a user can then access for troubleshooting, such as a .log file. This provides the user with a log file with runtime data about only the functions in the function call path for the errored function.
The function call paths 2A and 2B correspond to possible function call paths in the thread 2 230. In the function call path 2A, the entrant function calls the fn2() function at stage 614, and the fn2() function calls the fn5() function at stage 616. In the function call path 2B, the entrant function calls the fn2() function at stage 618, the fn2() function calls the fn6() function at stage 620, the fn6() function calls the fn2() function at stage 622, and the fn2() function calls the fn5() function at stage 624.
At stage 626, the fn5() function can encounter an error. The error can be any kind of error, such as a runtime error. For this example sequence diagram, the error occurred as part of function call path 2A. In response, at stage 628, the application can execute a stack trace. For example, if the application is written in JAVA, then the application can run getstacktrace() or dumpstack(), if the application is written in PYTHON, then the application can run a function from the traceback module, such as traceback.print_stack, and if the application is written in C or C++ and the glibc library is installed, then the application can run a backtrace function. The stack trace can return the function call path for function call path 2A.
At stage 630, the application can enable debug logging at each function in the function call path 2A. Unlike in stage 430 of
The application can enable debug logging by enabling debug flags included in wrappers added to the code for each function. In an example, the application can run a function for enabling and disabling debug flags. For example, the stack trace can return identifiers (“IDs”) for each function in the call path. The application can input these IDs into the debug flag function, which causes the function to enable debug flags at the functions matching those IDs. In this example, the IDs correspond to the entrant function, fn2(), and fn5(), but not fn6().
The functions with the debug flags enabled can then log their corresponding debug data, which can include the state of the function's stack and entry and exit points values for registered variables. The functions can do this by running a log command with IDs for the debug data inputted into the log command. For example, the entrant function can log its debug data at stage 632, the fn2() function can log its debug data at stage 634, and the fn5() function can log its debug data at stage 636. Once the debug data is logged, the application can disable the debug flags.
To execute the application 720, the processor 730 can execute instructions from the application's code. Some of the code can be stored in the memory 740 as the application 720 runs. For example, the application 720 can store call stacks for each thread of the application 720. The call stacks can include call frames for each function that runs. The application 720 can also store additional information in the memory 740 is defined in wrappers added to the code. For example, the application 720 can store the state of registered variables for each function an entry and exit point. If an error occurs, the registered variables and stack frame information for each function in the errored function's call path can be retried from the memory 740.
Other examples of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the examples disclosed herein. Though some of the described methods have been presented as a series of steps, it should be appreciated that one or more steps can occur simultaneously, in an overlapping fashion, or in a different order. The order of steps presented are only illustrative of the possibilities and those steps can be executed or performed in any suitable fashion. Moreover, the various features of the examples described here are not mutually exclusive. Rather any feature of any example described here can be incorporated into any other suitable example. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
Number | Date | Country | Kind |
---|---|---|---|
202241041949 | Jul 2022 | IN | national |
Benefit is claimed under 35 U.S.C. 119(a)-(d) to Foreign Application Serial No. 202241041949 filed in India entitled “DYNAMICALLY CAPTURING DEBUG LOGS DURING RUNTIME”, on Jul. 22, 2022, by VMware, Inc., which is herein incorporated in its entirety by reference for all purposes The present application (Attorney Docket No. 1232.01) is related in subject matter to U.S. Patent Application No. ______ (Attorney Docket No. 1232.02), which is incorporated herein by reference.